[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#540: 23.0.60; Unicode search bug
From: |
Chong Yidong |
Subject: |
bug#540: 23.0.60; Unicode search bug |
Date: |
Wed, 27 Aug 2008 00:15:57 -0400 |
Hi Handa-san,
Could you take a look at this bug report? Thanks.
Juri Linkov <juri@jurta.org> wrote:
> There is a weird bug in searching Unicode text. The search function
> fails on Cyrillic letters between codepoints #x0400 and #x041f, but
> successfully finds a Cyrillic letter between #x0420 and #x042f.
>
> I tried to debug this and see that in case of failure it calls
> `boyer_moore', and in case of successful search it calls
> `simple_search'. I checked the Unicode properties, but everything
> seems correct.
>
> This bug didn't exist before the Unicode merge.
>
> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
> buffer the following 4 lines (note the leading space):
>
> (search-forward " П" nil t)
> (search-forward " Р" nil t)
> П
> Р
>
> and type `C-x C-e' after each of first two lines.
Here, the failing case is:
П = 1055 = 10000011111
inverse(П) = 1087 = 10000111111
^^^^^^
whereas the case that works (by setting boyer_moore_ok to 0) is
Р = 1056 = 10000100000
inverse(Р) = 1088 = 10001000000
^^^^^^
I've indicated the last 6 bits, according to the logic in search_buffer
(which I don't fully understand).
- bug#540: 23.0.60; Unicode search bug,
Chong Yidong <=