[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#18266: handling bytes not part of the charset, and other garbage
From: |
Vincent Lefevre |
Subject: |
bug#18266: handling bytes not part of the charset, and other garbage |
Date: |
Sat, 13 Sep 2014 00:40:33 +0200 |
User-agent: |
Mutt/1.5.23-6361-vl-r59709 (2014-07-25) |
On 2014-09-12 14:39:35 -0700, Paul Eggert wrote:
> On 09/12/2014 02:29 PM, Vincent Lefevre wrote:
> >an option to control what happens on encoding errors would be
> >better and sufficient.
>
> It might suffice for your use cases, but it's more complicated and less
> flexible than being able to match bytes within the regular expression.
But IMHO, some solutions I proposed would be faster.
I wonder whether anyone is interested in matching individual bytes
in a file regarded as UTF-8 encoded. This seems weird.
> Speaking of hairy, why doesn't grep use PCRE_MULTILINE? Using
> PCRE_MULTILINE shouldn't be that hard, and should boost performance
> quite a bit in typical usage. Or am I being too optimistic here?
Perhaps in text files. In binary files, with the current solution,
I don't think this matters as failures due to invalid bytes
typically occur several times per line.
--
Vincent Lefèvre <address@hidden> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
- bug#18266: handling bytes not part of the charset, and other garbage, (continued)
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Jim Meyering, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage,
Vincent Lefevre <=
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/15
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/16