[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: On language-dependent defaults for character-folding
From: |
Eli Zaretskii |
Subject: |
Re: On language-dependent defaults for character-folding |
Date: |
Thu, 25 Feb 2016 18:24:08 +0200 |
> From: Juri Linkov <address@hidden>
> Cc: address@hidden, address@hidden, address@hidden, address@hidden
> Date: Thu, 25 Feb 2016 02:29:11 +0200
>
> >> >> It seems two user variables are necessary for customization:
> >> >>
> >> >> 1. inclusive folding groups that will include by default such pairs
> >> >> as o - ø, l - ł added to the Unicode decomposition-based rules,
> >> >> and allow the users to add more rules;
> >> >>
> >> >> 2. exclusive folding groups to exclude locale/language-dependent rules
> >> >> from
> >> >> the default mappings above, e.g. removing n - ñ for the "es" locale.
> >> >
> >> > I think we should add those in item 1 unconditionally (i.e. include
> >> > them in the default mappings), and then exclude some of them under the
> >> > rules you describe in item 2. Then the problem becomes easier, as we
> >> > only need to filter out some mappings, as determined by a single user
> >> > variable (whose default can come from the user locale).
> >>
> >> Better to have 4 variables (2 internal + 2 user customizable variables):
> >
> > Can you explain why it's better to have 4 variables rather than just
> > one?
>
> If you mean that one customizable variable should contain all mappings from
> UnicodeData.txt and decomps.txt presented to the user for customization,
> such a list will be too huge to customize: there are 5721 decompositions
> in UnicodeData.txt, and 6674 decompositions in decomps.txt.
No, of course not. That would be extremely inconvenient.
What I envisioned is a single variable that holds a list of folding
sub-features. Examples include ignoring diacritics, matching
ligatures and their decompositions, "controversial" foldings that
users of specific languages might not want, etc. The default value
will hold all of the sub-features; users that don't want some of them
will be able to remove them from the list, which will affect the
mapping at search time. We could also have a setting that means "DTRT
for my locale", which will remove the sub-features inappropriate for
the locale's language. Stuff like that.
> So we could have at least one default internal variable containing all
> decompositions from UnicodeData.txt plus decompositions from decomps.txt
> minus locale-dependent mappings.
Internally, we need a translation table for mapping equivalent
characters. This table should be recomputed (or selected among
several precomputed ones) according to the list of sub-features that
the user requested.
> > http://unicode.org/Public/UCA/latest/decomps.txt
> >
> > (The last release of Unicode is v8.0.)
>
> Thanks, comparing UnicodeData.txt with the latest decomps.txt shows
> 1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸)
> we need to add manually (a whole set of differences is attached below):
I think we need to create another uni-*.el file which defines a
decomposition char-table populated from decomps.txt.
- RE: On language-dependent defaults for character-folding, (continued)
- Re: On language-dependent defaults for character-folding, Richard Stallman, 2016/02/22
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/22
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/22
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/23
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/23
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/24
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/24
- Re: On language-dependent defaults for character-folding,
Eli Zaretskii <=
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/28
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/29
- Re: On language-dependent defaults for character-folding, Juri Linkov, 2016/02/29
- Re: On language-dependent defaults for character-folding, Richard Stallman, 2016/02/26
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/21
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/21
- Re: On language-dependent defaults for character-folding, Andreas Schwab, 2016/02/22
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/22
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/22
- Re: On language-dependent defaults for character-folding, Richard Stallman, 2016/02/21