[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] grep -i: work also when converting to lower-case inflates by
From: |
Paolo Bonzini |
Subject: |
Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count |
Date: |
Sat, 23 Jun 2012 17:06:13 +0200 |
>> Turkish lowercase i-with-dot is shorter than the uppercase, and
>> uppercase I-without-dot is shorter than the lowercase.
>
> Thanks! With that, I created a test case and watched it malfunction.
> That demonstrated a couple of invalid (in that case) assumptions in the
> new code. I suspect that this bug strikes only in relatively few locales.
Yes, my recollection is that it only happens on Turkish and Azeri
locales. Some more strangeness with case mappings occurs in
Lithuanian locales (an accented i conserves the dot, or something like
that!) but I think glibc doesn't implement that.
Anyhow, thanks for the work on this longstanding bug! I think this
was the last regression that was introduced in 2.6, so this is a major
achievement in the 2.x series (perhaps we should have called the 2.6
release 3.0).
The next step would be to add support for Unicode character classes,
and look into converting other multibyte locales to/from UTF-8 in
order to speed up the matches.
Paolo
- [bug #36567] grep -i (case-insensitive) is broken with UTF8, Strahinja Kustudic, 2012/06/01
- [bug #36567] grep -i (case-insensitive) is broken with UTF8, Paul Eggert, 2012/06/12
- [bug #36567] grep -i (case-insensitive) is broken with UTF8, Paul Eggert, 2012/06/12
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/12
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Paul Eggert, 2012/06/12
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Paolo Bonzini, 2012/06/15
- [PATCH] grep -i: work also when converting to lower-case inflates byte count, Jim Meyering, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count, Paul Eggert, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count, Jim Meyering, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count,
Paolo Bonzini <=
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Johannes Meixner, 2012/06/12