bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29871: 25.3; ZWJ word-boundaries in regexps


From: Eli Zaretskii
Subject: bug#29871: 25.3; ZWJ word-boundaries in regexps
Date: Wed, 27 Dec 2017 22:33:22 +0200

> From: "Mark Shoulson" <mark@nagas.meson.org>
> Date: Wed, 27 Dec 2017 14:07:40 -0500
> 
> According to http://unicode.org/reports/tr29/#Word_Boundaries rule WB4,
> it would seem that a ZWJ character (U+200D ZERO WIDTH JOINER) between
> two "word" characters should not constitute a word boundary.  And yet:
> 
> (string-match "\\<" "foo\u200Dfbar" 1)
> 
> evaluates to 4 (the 1 is to skip the word-beginning at the start of the
> string).  Or you can search for "\\b" or "\\>" and get 3.  Either way,
> indicative of a word-break at the ZWJ character.  Is this correct?

Emacs considers a change of script as a word break, and U+200D's
script is 'symbol', which is different from 'latin', the script of the
ASCII characters.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]