[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing i
From: |
Jim Meyering |
Subject: |
Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale) |
Date: |
Fri, 03 Jun 2011 19:09:22 +0200 |
Paolo Bonzini wrote:
> On 06/02/2011 11:08 PM, Jim Meyering wrote:
>> #if MBS_SUPPORT
>> - int b2 = wctob ((unsigned char) b);
>> - if (b2 == EOF || b2 == b)
>> + /* Below, note how when b2 != b and we have a uni-byte locale
>> + (MB_CUR_MAX == 1), we set b = b2. I.e., in a uni-byte locale,
>> + we can safely call setbit with a non-EOF value returned by wctob.
>> */
>> + int b2 = wctob (b);
>> + if (b2 == EOF || b2 == b || (MB_CUR_MAX == 1 ? (b=b2), 1 : 0))
>
> Can you explain again the reason for testing "b2 == EOF"? It seems
> wrong, and without it you can just make
>
> if (MB_CUR_MAX == 1 || b2 == b)
> setbit ((unsigned char) b, c);
Hi Paolo,
Your test would disable DFA-based matching for some bytes in a locale like
ru_RU.KOI8-R, because a pattern like [\360] leads to "wint_t b" having
the value 1055 (0x041F), and that is obviously too large to
be used as the first argument to setbit. However, converting
that "B" back to a single-byte value, B2, gives us back \360,
which is ok to use there. Hence the "(b=b2)" part of that
admittedly ugly expression.
The b2 == EOF part is required for the somewhat similar bug I fixed
a month ago:
fix a bug whereby echo c|grep '[c]' would fail for any c in 0x80..0xff
8da41c930e03a8635cbd8c89e3e591374c232c89
The corresponding test demonstrates the need:
tests: exercise bug with 0x80..0xff in [...]
d98338ebf842ec9b69631837eee50ebdcd543505
Thanks for the feedback.
If you see a better way, I'm sure you'll let me know.
BTW, seeing your cast, I now think it'd be prudent to
guard that setbit use:
#if MBS_SUPPORT
/* Below, note how when b2 != b and we have a uni-byte locale
(MB_CUR_MAX == 1), we set b = b2. I.e., in a uni-byte locale,
we can safely call setbit with a non-EOF value returned by wctob. */
int b2 = wctob (b);
if (b2 == EOF || b2 == b || (MB_CUR_MAX == 1 ? (b=b2), 1 : 0))
#endif
if (b < 256)
setbit (b, c);
}
- [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Santiago Ruano Rincón, 2011/06/02
- Re: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Jim Meyering, 2011/06/02
- Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Paolo Bonzini, 2011/06/03
- Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale),
Jim Meyering <=
- Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Paolo Bonzini, 2011/06/03
- Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Jim Meyering, 2011/06/03
- Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Jim Meyering, 2011/06/04
- Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Paolo Bonzini, 2011/06/05
- Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Jim Meyering, 2011/06/05
- Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Jim Meyering, 2011/06/05
- [PATCH] dfa: fix case folding logic for character ranges, Paolo Bonzini, 2011/06/07
- Re: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Jim Meyering, 2011/06/02