bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#13084: boyer_moore crashes with certain characters in the case table


From: Kenichi Handa
Subject: bug#13084: boyer_moore crashes with certain characters in the case table
Date: Sat, 15 Dec 2012 22:17:17 +0900

In article <83obhxoo2v.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Here, A and B belongs to the same character group iff A and
> > B has the same multibyte sequence except for the last byte.
> > In this condition, we should be able to use the table
> > simple_translate.

> OK, then maybe just the comments need to be fixed.  They shouldn't
> talk about "charset" and "row", which are undefined in Unicode Emacs.
> They should instead use terminology that correspond to UTF-8 multibyte
> representation of characters we use today.

I've just committed this change.  How is it?

=== modified file 'src/search.c'
--- src/search.c        2012-10-10 20:09:47 +0000
+++ src/search.c        2012-12-15 13:04:46 +0000
@@ -1313,8 +1313,11 @@
             non-nil, we can use boyer-moore search only if TRT can be
             represented by the byte array of 256 elements.  For that,
             all non-ASCII case-equivalents of all case-sensitive
-            characters in STRING must belong to the same charset and
-            row.  */
+            characters in STRING must belong to the same character
+            group (two characters belong to the same group iff their
+            multibyte forms are the same except for the last byte;
+            i.e. every 64 characters form a group; U+0000..U+003F,
+            U+0040..U+007F, U+0080..U+00BF, ...).  */
 
          while (--len >= 0)
            {

---
Kenichi Handa
handa@gnu.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]