bug#18266: handling bytes not part of the charset, and other garbage

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: handling bytes not part of the charset, and other garbage

From:	Paul Eggert
Subject:	bug#18266: handling bytes not part of the charset, and other garbage
Date:	Fri, 12 Sep 2014 14:39:35 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0

On 09/12/2014 02:29 PM, Vincent Lefevre wrote:

an option to control what happens on encoding errors would be betterand sufficient.

It might suffice for your use cases, but it's more complicated and lessflexible than being able to match bytes within the regular expression.(Plus, someone would have to implement it, which is perhaps the biggestobjection to either approach ....) But I take your point that \C isbest avoided. This whole area is pretty hairy, I'm afraid.

Speaking of hairy, why doesn't grep use PCRE_MULTILINE? UsingPCRE_MULTILINE shouldn't be that hard, and should boost performancequite a bit in typical usage. Or am I being too optimistic here?

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18266: grep -P and invalid exits with error, (continued)

Prev by Date: bug#18266: handling bytes not part of the charset, and other garbage
Next by Date: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Previous by thread: bug#18266: handling bytes not part of the charset, and other garbage
Next by thread: bug#18266: handling bytes not part of the charset, and other garbage
Index(es):
- Date
- Thread