emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character folding in the pretest


From: Marcin Borkowski
Subject: Re: Character folding in the pretest
Date: Mon, 08 Feb 2016 20:18:48 +0100
User-agent: mu4e 0.9.13; emacs 25.1.50.1

On 2016-02-08, at 18:48, Eli Zaretskii <address@hidden> wrote:

>> From: Marcin Borkowski <address@hidden>
>> Date: Mon, 08 Feb 2016 15:05:05 +0100
>> Cc: address@hidden, address@hidden, address@hidden
>> 
>> Just as another datapoint in discussion: for me, searching for "l" and
>> finding "ł" seems a bit weird.  (The opposite even more so.)
>
> Which is why neither one happens under character folding.
>
>> BTW, strangely enough, here isearching for "l" does /not/ find "ł", but
>> isearching for "a" (with character folding on) finds "ą".  Whatever one
>> thinks about char folding, this is clearly a bug.
>
> It's not a bug, it's the feature working as designed: we only fold
> characters that have suitable decompositions in the Unicode Character
> Database.  So:
>
>   (get-char-code-property ?ą 'decomposition) => (97 808)
>
> but
>
>   (get-char-code-property ?ł 'decomposition) => (322)
>
> IOW, ą is canonically equivalent to the 2-character sequence a ̨ (which
> is why searching for a finds that character), while ł has no canonical
> decomposition (nor any other decomposition).
>
> This means that the Unicode guys decided that ł should not be
> equivalent to any other sequence of characters, and therefore Emacs
> doesn't find it unless you search for it literally.
>
> If you want to know why ł doesn't have any decompositions, I suggest
> to ask on the Unicode mailing list, I'm sure they had good reasons,
> most probably reasons that came from people who are experts in the
> Polish language and its intricacies.  We just trust the results.

Thanks for the explanation, Eli!

However, given the number of bugs/quirks in Unicode, I'd personally
prefer not to trust them too much.  (Though I understand that the Emacs
devs /have/ to trust someone, and choosing the Unicode people is
probably not a bad idea generally.)  Funnily, one of the more annoying
bugs in Unicode is connected with quotes, AFAIR.  (Why not beat a dead
horse? ;-))  And folding "ą" to "a" while not "ł" to "l" is something
which most Poles (I guess) would treat as a serious, WTF-level bug.  And
good luck to all non-Polish people with isearching for the name of Jan
Łukasiewicz (just to choose a Lisp-related name;-)).

Yet another datapoint suggesting that the issue is really complicated,
and that Drew is right: if this is not configurable by users, it might
end up more annoying than helping.  (Not to say it won't - I trust Artur
here.)

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



reply via email to

[Prev in Thread] Current Thread [Next in Thread]