emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On language-dependent defaults for character-folding


From: Juri Linkov
Subject: Re: On language-dependent defaults for character-folding
Date: Thu, 25 Feb 2016 02:29:11 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.91 (x86_64-pc-linux-gnu)

>> >> It seems two user variables are necessary for customization:
>> >>
>> >> 1. inclusive folding groups that will include by default such pairs
>> >>    as o - ø, l - ł added to the Unicode decomposition-based rules,
>> >>    and allow the users to add more rules;
>> >>
>> >> 2. exclusive folding groups to exclude locale/language-dependent rules 
>> >> from
>> >>    the default mappings above, e.g. removing n - ñ for the "es" locale.
>> >
>> > I think we should add those in item 1 unconditionally (i.e. include
>> > them in the default mappings), and then exclude some of them under the
>> > rules you describe in item 2.  Then the problem becomes easier, as we
>> > only need to filter out some mappings, as determined by a single user
>> > variable (whose default can come from the user locale).
>> 
>> Better to have 4 variables (2 internal + 2 user customizable variables):
>
> Can you explain why it's better to have 4 variables rather than just
> one?

If you mean that one customizable variable should contain all mappings from
UnicodeData.txt and decomps.txt presented to the user for customization,
such a list will be too huge to customize: there are 5721 decompositions
in UnicodeData.txt, and 6674 decompositions in decomps.txt.

So we could have at least one default internal variable containing all
decompositions from UnicodeData.txt plus decompositions from decomps.txt
minus locale-dependent mappings.

Then 2 user customizable variables should be enough: one will allow
the users to add a mapping to the default list, and another to remove
a mapping from the default list.

>> It would be good to find all differences between UnicodeData.txt and
>> decomps.txt.  Is this the latest version?
>> http://unicode.org/Public/UCA/6.3.0/decomps.txt
>
> No, the latest is always here:
>
>   http://unicode.org/Public/UCA/latest/decomps.txt
>
> (The last release of Unicode is v8.0.)

Thanks, comparing UnicodeData.txt with the latest decomps.txt shows
1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸)
we need to add manually (a whole set of differences is attached below):

Attachment: UnicodeData_decomps.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]