[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Different names for Unicode codepoint
From: |
Eli Zaretskii |
Subject: |
Re: Different names for Unicode codepoint |
Date: |
Thu, 21 Apr 2016 22:40:17 +0300 |
> From: Lele Gaifax <lele@metapensiero.it>
> Date: Thu, 21 Apr 2016 21:04:32 +0200
> Cc: python-list@python.org
>
> is there a particular reason for the slightly different names that Emacs
> (version 25.0.92) and Python (version 3.6.0a0) give to a single Unicode
> entity?
They don't.
> Just to mention one codepoint, ⋖ is called "LESS THAN WITH DOT" accordingly to
> Emacs' C-x 8 RET TAB menu, while in Python:
>
> >>> import unicodedata
> >>> unicodedata.name('⋖')
> 'LESS-THAN WITH DOT'
> >>> print("\N{LESS THAN WITH DOT}")
> File "<stdin>", line 1
> SyntaxError: (unicode error) ...: unknown Unicode character name
Emacs shows both the "Name" and the "Old Name" properties of
characters as completion candidates, while Python evidently supports
only "Name". If you type "C-x 8 RET LESS TAB", then you will see
among the completion candidates both "LESS THAN WITH DOT" and
"LESS-THAN WITH DOT". The former is the "old name" of this character,
according to the Unicode Character Database (which is where Emacs
obtains the names and other properties of characters).