[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: On language-dependent defaults for character-folding
From: |
Juri Linkov |
Subject: |
Re: On language-dependent defaults for character-folding |
Date: |
Mon, 29 Feb 2016 02:22:02 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.0.91 (x86_64-pc-linux-gnu) |
> What I envisioned is a single variable that holds a list of folding
> sub-features. Examples include ignoring diacritics, matching
> ligatures and their decompositions, "controversial" foldings that
> users of specific languages might not want, etc. The default value
> will hold all of the sub-features; users that don't want some of them
> will be able to remove them from the list, which will affect the
> mapping at search time. We could also have a setting that means "DTRT
> for my locale", which will remove the sub-features inappropriate for
> the locale's language. Stuff like that.
Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...?
Not sure if such terms are self-descriptive. At least plain pairs like
'((o ø) (l ł) ...) should be enough to customize at the base character level,
and later we might consider grouping such pairs into a more high-level
features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc.
>> So we could have at least one default internal variable containing all
>> decompositions from UnicodeData.txt plus decompositions from decomps.txt
>> minus locale-dependent mappings.
>
> Internally, we need a translation table for mapping equivalent
> characters. This table should be recomputed (or selected among
> several precomputed ones) according to the list of sub-features that
> the user requested.
Or maybe customizing a variable like (defcustom char-fold-language
(with the default depending on the user locale) could reevaluate
the table on saving the modified value.
>> > http://unicode.org/Public/UCA/latest/decomps.txt
>> >
>> > (The last release of Unicode is v8.0.)
>>
>> Thanks, comparing UnicodeData.txt with the latest decomps.txt shows
>> 1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸)
>> we need to add manually (a whole set of differences is attached below):
>
> I think we need to create another uni-*.el file which defines a
> decomposition char-table populated from decomps.txt.
The name of the currently used Unicode character property is “decomposition”.
What would be a good name for the property from decomps.txt? “decomposition2”?
- Re: On language-dependent defaults for character-folding, (continued)
- Re: On language-dependent defaults for character-folding, Richard Stallman, 2016/02/22
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/22
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/22
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/23
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/23
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/24
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/24
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/25
- Re: On language-dependent defaults for character-folding,
Juri Linkov <=
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/29
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/29
- Re: On language-dependent defaults for character-folding, Richard Stallman, 2016/02/26
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/21
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/21
- Re: On language-dependent defaults for character-folding, Andreas Schwab, 2016/02/22
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/22
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/22
- Re: On language-dependent defaults for character-folding, Richard Stallman, 2016/02/21
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/21