|
From: | Paolo Bonzini |
Subject: | bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine |
Date: | Tue, 04 Mar 2014 16:49:59 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 |
Il 03/03/2014 07:13, Paul Eggert ha scritto:
Norihiro Tanaka wrote:However I don't understand why the optimization isn't completed on non-UTF8 locale only. Can you explain it?Sorry, no; there's a lot about that code I don't yet understand.
IIRC it's because a CSET matches any byte, while the corresponding MBCSET only matches that byte if it is a single-byte character. So for example, say "\x83A" is a two-byte character. The CSET "A" will match it but the corresponding MBCSET will not.
This can happen in the Shift-JIS encoding. Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |