Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets

From:	Jim Meyering
Subject:	Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets
Date:	Mon, 15 Mar 2010 12:11:31 +0100

Paolo Bonzini wrote:
>> Hi Paolo,
>>
>> Do you have a test that exercises this fix?
>> As far as I can see, the above tests currently succeed
>> with grep built from master.  I expected them to fail.
>
> It passes because strcoll (and hences ranges) is case-insensitive in
> many locales:
>
> $ printf '1\ny\n.\n' | LC_ALL=en_US.UTF-8 grep '[A-Z]'
> y
>
> (Note no -i).  It would fail in something like C.UTF-8, but it is not
> portable and as far as I know it only works under Cygwin---not even
> glibc supports it:
>
> $ LC_ALL=C.UTF-8 bash
> bash: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
>
> So I included the test more for completeness than anything else,
> hoping that we get coverage on a system where strcoll is case
> sensitive.

Well, I would really like a test that passes with,
and fails without, that fix, so how about using something like this:

This shows that grep-2.5.3 gets it wrong:

    $ printf '%s\n' A Z | LC_ALL=en_US.UTF-8 grep -i '[a-z]'
    A

and with your fix, grep -i does what we would expect:

    $ printf '%s\n' A Z | LC_ALL=en_US.UTF-8 src/grep -i '[a-z]'
    A
    Z

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v2 0/9] UTF-8 speedups, Paolo Bonzini, 2010/03/14
- [PATCH 1/9] tests: add more UTF-8 test cases, Paolo Bonzini, 2010/03/14
  - Re: [PATCH 1/9] tests: add more UTF-8 test cases, Jim Meyering, 2010/03/15
- [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets, Paolo Bonzini, 2010/03/14
  - Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets, Jim Meyering, 2010/03/15
    - Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets, Paolo Bonzini, 2010/03/15
    - Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets, Jim Meyering <=
    - Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets, Paolo Bonzini, 2010/03/15
    - Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets, Jim Meyering, 2010/03/15
- [PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing, Paolo Bonzini, 2010/03/14
  - Re: [PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing, Jim Meyering, 2010/03/16
    - Re: [PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing, Paolo Bonzini, 2010/03/17
- [PATCH 4/9] dfa: speed up handling of brackets, Paolo Bonzini, 2010/03/14
  - Re: [PATCH 4/9] dfa: speed up handling of brackets, Jim Meyering, 2010/03/17
    - Re: [PATCH 4/9] dfa: speed up handling of brackets, Paolo Bonzini, 2010/03/17
    - Re: [PATCH 4/9] dfa: speed up handling of brackets, Jim Meyering, 2010/03/17
- [PATCH 5/9] dfa: optimize simple character sets under UTF-8 charsets, Paolo Bonzini, 2010/03/14

Prev by Date: Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets
Next by Date: Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets
Previous by thread: Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets
Next by thread: Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets
Index(es):
- Date
- Thread