bug#18266: grep -P and invalid exits with error

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: grep -P and invalid exits with error

From:	Paul Eggert
Subject:	bug#18266: grep -P and invalid exits with error
Date:	Mon, 01 Sep 2014 01:31:53 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0

Vincent Lefevre wrote:

       [...] Note that this option can also be passed to pcre_exec()
       and pcre_dfa_exec(), to suppress the validity checking of
       subject strings only. If the same string is being matched
       many times, the option can be safely set for the second and
       subsequent matchings to improve performance.

The last sentence would imply that the UTF8 checking is done on the
whole input buffer before matching is done.

That's pretty subtle, and perhaps too subtle. A plausibleinterpretation of the phrase "same string is being matched" is thatlibpcre checks only the matched string, and that bytes after the match(which did not need to be examined to do the match) are not checked.Can you confirm with the libpcre authors that this plausibleinterpretation is incorrect, i.e., that the entire input string ischecked, even the unmatched part? If that's what is intended, thedocumentation should state so clearly, so at least there's adocumentation bug there.

If there are many invalid UTF8 bytes, this would be slow, IMHO


That's OK.  We don't need grep -P to be fast on invalid input.

But is the copy of the buffer really needed? Couldn't the invalid
UTF8 sequences just be replaced by null bytes?

I'd rather not, because that changes the semantics of matching. Thenull byte is valid input data that might get matched.

in case of invalid UTF8 bytes, in some (many?) cases, the
cause is a binary file (possibly with some text in it), where lines
can be very long. So, wouldn't it mean that it can take significantly
more memory?


Sure.  But that's the same for -P as it is for plain grep.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18266: grep -P and invalid exits with error, Vincent Lefevre, 2014/09/01
- bug#18266: grep -P and invalid exits with error, Paul Eggert <=
  - bug#18266: grep -P and invalid exits with error, Santiago, 2014/09/08
    - bug#18266: grep -P and invalid exits with error, Norihiro Tanaka, 2014/09/09
    - bug#18266: grep -P and invalid exits with error, Paul Eggert, 2014/09/09
    - bug#18266: grep -P and invalid exits with error, Norihiro Tanaka, 2014/09/09
    - bug#18266: grep -P and invalid exits with error, Paul Eggert, 2014/09/09
    - bug#18266: grep -P and invalid exits with error, Paul Eggert, 2014/09/10
    - bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error, Santiago, 2014/09/10
    - bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error, Vincent Lefevre, 2014/09/11
    - bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error, Paul Eggert, 2014/09/11
    - bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error, Jim Meyering, 2014/09/11

Prev by Date: bug#18266: grep -P and invalid exits with error
Next by Date: bug#18377: piping output through egrep -- escaping of "+" only needed sometimes. Why?
Previous by thread: bug#18266: grep -P and invalid exits with error
Next by thread: bug#18266: grep -P and invalid exits with error
Index(es):
- Date
- Thread