bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error


From: Vincent Lefevre
Subject: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error
Date: Fri, 12 Sep 2014 03:42:47 +0200
User-agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25)

On 2014-09-11 10:07:49 -0700, Paul Eggert wrote:
> Vincent Lefevre wrote:
> >I've just reported a new Debian concerning the performance problem.
> 
> It's not clear from http://bugs.debian.org/761157 that the performance
> problem occurs only with -P, but I assume that's what is meant.

It's specific to -P:

2.18-2   0.9s with -P, 0.4s without -P
2.20-3  11.6s with -P, 0.4s without -P

> Since this is a performance bug with PCRE, I suggest moving the Debian bug
> report to the Debian libpcre3 package.  Grep cannot go back to the old way,
> which could cause grep to crash, and the bug cannot be fixed in grep because
> libpcre3 does not provide a fast way to search arbitrary data that may
> include encoding errors.  It really is a problem that requires changes to
> libpcre3 to fix; grep cannot fix it.

Fixing the performance problem in libpcre3 would indeed be better
(even with the old version of grep, libpcre3 was twice as slow as
grep, but this is less critical than a 13x slowdown).

However a workaround in grep could be simpler. I've just opened a
new bug and suggested several solutions:

  http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18454

> In the meantime, in order to use 'grep' to search for strings in arbitrary
> data, I suggest omitting the '-P'.

This is a bit annoying because I sometimes use specific PCRE features.
I could try to parse the arguments, detect where the pattern is used,
and avoid -P if the pattern doesn't use specific PCRE features (at
least for the most common forms). An additional advantage is that it
could be twice as fast in most cases (see above). This could also be
done in grep, as I suggested in my new bug report.

> Also, I suggest using the C locale.

This could be a solution, because in practice, I pipe the result
to "less -FRX", but only grep has to use the C locale, so that the
accented characters are correctly displayed by "less". However with
some (rare?) patterns, it won't work because an accented character
would no longer be seen as a single character.

-- 
Vincent Lefèvre <address@hidden> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]