[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: removing blank lines: "grep ." is really slow
From: |
Paolo Bonzini |
Subject: |
Re: removing blank lines: "grep ." is really slow |
Date: |
Sat, 24 Apr 2010 08:45:06 +0200 |
On Fri, Apr 23, 2010 at 22:51, Paul Eggert <address@hidden> wrote:
> Paolo Bonzini <address@hidden> writes:
>
>> On 04/18/2010 06:32 AM, Ivan wrote:
>>> So... right now, "." means "valid UTF-8 character"? Or not?
>>
>> Yes, if your locale is UTF-8.
>
> Wouldn't it be better to model encoding errors as characters? That is,
> if grep sees a byte that cannot possibly be the start of a character, we
> call it a "character" even though it is not in the standard Unicode
> character set. Internally, we could model it as (say) a negative
> number, the negative of the byte value (so it would be in the range -255
> .. -128).
This would have to be changed in glibc first, and then in dfa.c.
Encoding errors in the regex are supported, but . doesn't capture an
invalid character.
Paolo