bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#13041: 24.2; diacritic-fold-search


From: Juri Linkov
Subject: bug#13041: 24.2; diacritic-fold-search
Date: Sun, 09 Dec 2012 01:07:12 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu)

> This means that when you type the second "f" you might get a match
> before the present one.  Consider a buffer containing the two lines
> suffer
> suffer
>
> Typing "suf" as search string would go to "suffer".  Adding an "f" to
> the search string now would go back to "suffer" (or not).
Going back looks like backtracking in the regexp search.

OTOH, instead of using an approach of matching only a full match
like in Chromium, we could do like GEdit and OpenOffice that
match the whole ligature character in a partial match
(i.e. to match "ff" when the search string is just "f").

Though this has a problem of highlighting the whole character for
a partial match that looks wrong, but perhaps no one can do better.

>> http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
>> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt
>
> Case folding "ß" to "SS" (upper case "S") is not what I had in mind.  I
> was talking about the (weak?) equivalence of "ß" and "ss" (lower case
> "s") which is much more important when searching.  In particular so,
> because many German words that were earlier written with an "ß" are now
> written with "ss".

Yes, this is what I meant too.  It is surprising but
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
defines the equivalence of "ß" and "ss" (lower case "s")
instead of case-folding.  The following line in CaseFolding.txt:

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

maps 00DF (LATIN SMALL LETTER SHARP S) to two characters
0073 0073 (LATIN SMALL LETTER S) keeping the lower case.
Maybe this is a bug in Unicode data?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]