emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On language-dependent defaults for character-folding


From: Eli Zaretskii
Subject: Re: On language-dependent defaults for character-folding
Date: Sat, 20 Feb 2016 12:44:01 +0200

> Date: Sat, 20 Feb 2016 18:08:20 +0800
> From: Elias Mårtenson <address@hidden>
> Cc: Lars Ingebrigtsen <address@hidden>, emacs-devel <address@hidden>
> 
>  It is possible that you only see the "equivalence" parts of all these
>  sources. But in that case, you are actually claiming that folding
>  characters should never be done at all! "Folding" means mapping
>  _distinct_ character sequences to the same basic sequence. You start
>  from a normalization form, then compare the results disregarding
>  certain secondary, tertiary, etc. differences.
> 
> Of course. But the fact that you start from a normalisation form is of 
> secondary relevance here. I thinking that
> perhaps repeating the fact that the normalised form is used has somewhat 
> clouded the discussion.
> 
> When you say "ignoring [...] differences", how do you determine those 
> differences?
> 
>  > Again (I really apologise for repeating myself, I'm starting to sound like 
> a troll and that is truly not my
>  intention),
>  > the purpose of normalisation forms are to ensure that the two variants of 
> ñ compare the same. It is
>  not
>  > designed to provide a mechanism to allow n to compare equal to ñ.
> 
>  Under character-folding that ignores diacritics, ñ should indeed
>  compare equal to n.
> 
> 
> Yes again. But how do you determine what rules to apply?

Emacs currently ignores _any_ non-base differences, so ignoring is
simple: we disregard any characters in the decomposition except the
first one, which is the base character.

Further improvements in this direction will need to access additional
Unicode properties (to properly order the combining marks), and
perhaps additional tables.  But this is something to consider in the
future, and it will have to be done in C anyway; the regexp based
implementation cannot cut it.

>  > That's a good idea actually.
> 
>  That's a relief. I was beginning to suspect I don't have any good
>  ideas at all.
> 
> Apparently I have given the impression that I think your ideas are garbage. I 
> profoundly apologise for this and
> will try to be better going forward.

My smilies are usually implicit, so no sweat.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]