bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: removing blank lines: "grep ." is really slow


From: Paolo Bonzini
Subject: Re: removing blank lines: "grep ." is really slow
Date: Sat, 24 Apr 2010 08:45:06 +0200

On Fri, Apr 23, 2010 at 22:51, Paul Eggert <address@hidden> wrote:
> Paolo Bonzini <address@hidden> writes:
>
>> On 04/18/2010 06:32 AM, Ivan wrote:
>>> So... right now, "." means "valid UTF-8 character"? Or not?
>>
>> Yes, if your locale is UTF-8.
>
> Wouldn't it be better to model encoding errors as characters?  That is,
> if grep sees a byte that cannot possibly be the start of a character, we
> call it a "character" even though it is not in the standard Unicode
> character set.  Internally, we could model it as (say) a negative
> number, the negative of the byte value (so it would be in the range -255
> .. -128).

This would have to be changed in glibc first, and then in dfa.c.

Encoding errors in the regex are supported, but . doesn't capture an
invalid character.

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]