bug#540: 23.0.60; Unicode search bug

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#540: 23.0.60; Unicode search bug

From:	Chong Yidong
Subject:	bug#540: 23.0.60; Unicode search bug
Date:	Wed, 27 Aug 2008 00:15:57 -0400

Hi Handa-san,

Could you take a look at this bug report?  Thanks.

Juri Linkov <juri@jurta.org> wrote:
> There is a weird bug in searching Unicode text.  The search function
> fails on Cyrillic letters between codepoints #x0400 and #x041f, but
> successfully finds a Cyrillic letter between #x0420 and #x042f.
>
> I tried to debug this and see that in case of failure it calls
> `boyer_moore', and in case of successful search it calls
> `simple_search'.  I checked the Unicode properties, but everything
> seems correct.
>
> This bug didn't exist before the Unicode merge.
>
> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
> buffer the following 4 lines (note the leading space):
>
> (search-forward " П" nil t)
> (search-forward " Р" nil t)
>  П
>  Р
>
> and type `C-x C-e' after each of first two lines.

Here, the failing case is:

П          = 1055 = 10000011111
inverse(П) = 1087 = 10000111111
                         ^^^^^^

whereas the case that works (by setting boyer_moore_ok to 0) is

Р          = 1056 = 10000100000
inverse(Р) = 1088 = 10001000000
                         ^^^^^^

I've indicated the last 6 bits, according to the logic in search_buffer
(which I don't fully understand).

[Prev in Thread]

Current Thread

[Next in Thread]

bug#540: 23.0.60; Unicode search bug, Chong Yidong <=
- bug#540: 23.0.60; Unicode search bug, Andreas Schwab, 2008/08/27

Prev by Date: Processed: close 457
Next by Date: bug#781: 23.0.60; C-x C-f after C-x 4 b doesn't work (perhaps ido-mode related)
Previous by thread: Processed: close 457
Next by thread: bug#540: 23.0.60; Unicode search bug
Index(es):
- Date
- Thread