bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] grep -i: work also when converting to lower-case inflates by


From: Paolo Bonzini
Subject: Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count
Date: Sat, 23 Jun 2012 17:06:13 +0200

>> Turkish lowercase i-with-dot is shorter than the uppercase, and
>> uppercase I-without-dot is shorter than the lowercase.
>
> Thanks!  With that, I created a test case and watched it malfunction.
> That demonstrated a couple of invalid (in that case) assumptions in the
> new code.  I suspect that this bug strikes only in relatively few locales.

Yes, my recollection is that it only happens on Turkish and Azeri
locales.  Some more strangeness with case mappings occurs in
Lithuanian locales (an accented i conserves the dot, or something like
that!) but I think glibc doesn't implement that.

Anyhow, thanks for the work on this longstanding bug!  I think this
was the last regression that was introduced in 2.6, so this is a major
achievement in the 2.x series (perhaps we should have called the 2.6
release 3.0).

The next step would be to add support for Unicode character classes,
and look into converting other multibyte locales to/from UTF-8 in
order to speed up the matches.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]