bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#540: 23.0.60; Unicode search bug


From: Chong Yidong
Subject: bug#540: 23.0.60; Unicode search bug
Date: Wed, 27 Aug 2008 00:15:57 -0400

Hi Handa-san,

Could you take a look at this bug report?  Thanks.

Juri Linkov <juri@jurta.org> wrote:
> There is a weird bug in searching Unicode text.  The search function
> fails on Cyrillic letters between codepoints #x0400 and #x041f, but
> successfully finds a Cyrillic letter between #x0420 and #x042f.
>
> I tried to debug this and see that in case of failure it calls
> `boyer_moore', and in case of successful search it calls
> `simple_search'.  I checked the Unicode properties, but everything
> seems correct.
>
> This bug didn't exist before the Unicode merge.
>
> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
> buffer the following 4 lines (note the leading space):
>
> (search-forward " П" nil t)
> (search-forward " Р" nil t)
>  П
>  Р
>
> and type `C-x C-e' after each of first two lines.

Here, the failing case is:

П          = 1055 = 10000011111
inverse(П) = 1087 = 10000111111
                         ^^^^^^

whereas the case that works (by setting boyer_moore_ok to 0) is

Р          = 1056 = 10000100000
inverse(Р) = 1088 = 10001000000
                         ^^^^^^

I've indicated the last 6 bits, according to the logic in search_buffer
(which I don't fully understand).






reply via email to

[Prev in Thread] Current Thread [Next in Thread]