emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On language-dependent defaults for character-folding


From: Juri Linkov
Subject: Re: On language-dependent defaults for character-folding
Date: Mon, 29 Feb 2016 02:22:02 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.91 (x86_64-pc-linux-gnu)

> What I envisioned is a single variable that holds a list of folding
> sub-features.  Examples include ignoring diacritics, matching
> ligatures and their decompositions, "controversial" foldings that
> users of specific languages might not want, etc.  The default value
> will hold all of the sub-features; users that don't want some of them
> will be able to remove them from the list, which will affect the
> mapping at search time.  We could also have a setting that means "DTRT
> for my locale", which will remove the sub-features inappropriate for
> the locale's language.  Stuff like that.

Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...?

Not sure if such terms are self-descriptive.  At least plain pairs like
'((o ø) (l ł) ...) should be enough to customize at the base character level,
and later we might consider grouping such pairs into a more high-level
features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc.

>> So we could have at least one default internal variable containing all
>> decompositions from UnicodeData.txt plus decompositions from decomps.txt
>> minus locale-dependent mappings.
>
> Internally, we need a translation table for mapping equivalent
> characters.  This table should be recomputed (or selected among
> several precomputed ones) according to the list of sub-features that
> the user requested.

Or maybe customizing a variable like (defcustom char-fold-language
(with the default depending on the user locale) could reevaluate
the table on saving the modified value.

>> >   http://unicode.org/Public/UCA/latest/decomps.txt
>> >
>> > (The last release of Unicode is v8.0.)
>>
>> Thanks, comparing UnicodeData.txt with the latest decomps.txt shows
>> 1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸)
>> we need to add manually (a whole set of differences is attached below):
>
> I think we need to create another uni-*.el file which defines a
> decomposition char-table populated from decomps.txt.

The name of the currently used Unicode character property is “decomposition”.
What would be a good name for the property from decomps.txt? “decomposition2”?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]