bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20499: C-x 8 shorthands for curved quotes, Euro, etc.


From: Eli Zaretskii
Subject: bug#20499: C-x 8 shorthands for curved quotes, Euro, etc.
Date: Thu, 07 May 2015 17:44:25 +0300

> From: Ivan Shmakov <ivan@siamics.net>
> Date: Thu, 07 May 2015 10:00:38 +0000
> 
> > Although I suppose it comes from a decomposition table, I don't know
>  > what the table was designed for, and it's not clear to me how it's
>  > relevant.
> 
>         I hope someone more knowledgeable could comment on this.

I'm not sure I'm your man, or what needs to be commented on, but I
will try nonetheless ;-)

The 'decomposition property of a character (as every other property
accessed by get-char-code-property) comes directly from Unicode
database.  In this case, you will see that some characters in
UnicodeData.txt have this part non-empty:

  1E99;LATIN SMALL LETTER Y WITH RING ABOVE;Ll;0;L;0079 030A;;;;N;;;;;
                                                   ^^^^^^^^^
This gives the so-called "canonical decomposition" of the character;
in this case, we are told that U+1E99's decomposition is a sequence of
U+0079 (lower-case y) followed by U+030A (combining ring above).

Some characters have "compatibility decompositions" instead, like
this:

  1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING;Ll;0;L;<compat> 0061 
02BE;;;;N;;;;;
                                                        ^^^^^^^^^^^^^^^^^^
which is useful for collation-driven sorting and for loose comparisons
a-la string-collate-lessp.

For more details about this, see http://unicode.org/reports/tr44/, the
Unicode Technical Report that describes the Unicode Character
Database.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]