|
From: | Paul Eggert |
Subject: | bug#18266: handling bytes not part of the charset, and other garbage |
Date: | Fri, 12 Sep 2014 14:39:35 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 |
On 09/12/2014 02:29 PM, Vincent Lefevre wrote:
an option to control what happens on encoding errors would be better and sufficient.
It might suffice for your use cases, but it's more complicated and less flexible than being able to match bytes within the regular expression. (Plus, someone would have to implement it, which is perhaps the biggest objection to either approach ....) But I take your point that \C is best avoided. This whole area is pretty hairy, I'm afraid.
Speaking of hairy, why doesn't grep use PCRE_MULTILINE? Using PCRE_MULTILINE shouldn't be that hard, and should boost performance quite a bit in typical usage. Or am I being too optimistic here?
[Prev in Thread] | Current Thread | [Next in Thread] |