[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets
From: |
Jim Meyering |
Subject: |
Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets |
Date: |
Mon, 15 Mar 2010 12:11:31 +0100 |
Paolo Bonzini wrote:
>> Hi Paolo,
>>
>> Do you have a test that exercises this fix?
>> As far as I can see, the above tests currently succeed
>> with grep built from master. I expected them to fail.
>
> It passes because strcoll (and hences ranges) is case-insensitive in
> many locales:
>
> $ printf '1\ny\n.\n' | LC_ALL=en_US.UTF-8 grep '[A-Z]'
> y
>
> (Note no -i). It would fail in something like C.UTF-8, but it is not
> portable and as far as I know it only works under Cygwin---not even
> glibc supports it:
>
> $ LC_ALL=C.UTF-8 bash
> bash: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
>
> So I included the test more for completeness than anything else,
> hoping that we get coverage on a system where strcoll is case
> sensitive.
Well, I would really like a test that passes with,
and fails without, that fix, so how about using something like this:
This shows that grep-2.5.3 gets it wrong:
$ printf '%s\n' A Z | LC_ALL=en_US.UTF-8 grep -i '[a-z]'
A
and with your fix, grep -i does what we would expect:
$ printf '%s\n' A Z | LC_ALL=en_US.UTF-8 src/grep -i '[a-z]'
A
Z
[PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing, Paolo Bonzini, 2010/03/14
[PATCH 4/9] dfa: speed up handling of brackets, Paolo Bonzini, 2010/03/14
[PATCH 5/9] dfa: optimize simple character sets under UTF-8 charsets, Paolo Bonzini, 2010/03/14