bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#13041: 24.2; diacritic-fold-search


From: martin rudalics
Subject: bug#13041: 24.2; diacritic-fold-search
Date: Sat, 08 Dec 2012 12:21:48 +0100

>> - leave the text alone but give each string that should be handled
>>   specially a text property with the normalized form.  In this case
>>   searching has to pay attention to these properties, if present.
>>
>> - normalize the text and give each normalized string a text property
>>   with the original text.  In this case searching will proceed as usual
>>   but you have to restore the original text when done.
>
> This reminds an idea that searching should take into account the text
> displayed with the `display' property and other display-related properties.
> It seems this is more difficult to implement.

... and probably should include searching for overlays too.

>> Also I don't know how to handle the return value and/or highlighting
>> when, for example, finding a match for "suf" within "suffer".  For
>> example, replacing each occurrence of "suf" with the empty string should
>> leave us with "fer" here.
>
> I believe such ligature characters should be handled as a whole,
> i.e. "suf" doesn't match "suffer", only "suff" should match it.

This means that when you type the second "f" you might get a match
before the present one.  Consider a buffer containing the two lines

suffer
suffer

Typing "suf" as search string would go to "suffer".  Adding an "f" to
the search string now would go back to "suffer" (or not).  Disconcerting
in any case.

>> I have no idea how many mappings like "ß" -> "ss" exist.  The problem is
>> that we don't get them from UnicodeData.txt IIUC.
>
> I can't find them in UnicodeData.txt too.  Looking at the files in
> http://www.unicode.org/Public/UNIDATA/ can find them in the file
>
> http://www.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt
>
> that is derived from
>
> http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt

Case folding "ß" to "SS" (upper case "S") is not what I had in mind.  I
was talking about the (weak?) equivalence of "ß" and "ss" (lower case
"s") which is much more important when searching.  In particular so,
because many German words that were earlier written with an "ß" are now
written with "ss".

martin






reply via email to

[Prev in Thread] Current Thread [Next in Thread]