bug#26477: what-cursor-position should mention "U+"

積丹尼 Dan Jacobson <jidanni@jidanni.org> schrieb am Do., 13. Apr. 2017 um 13:48 Uhr:

C-x = (translated from <return>) runs the command what-cursor-position
(found in global-map), which is an interactive compiled Lisp function
in ‘simple.el’.

says:

position: 120538 of 121236 (99%), column: 0
character: ○ (displayed as ○) (codepoint 9675, #o22713, #x25cb)
preferred charset: chinese-big5-1 (Frequently used part (A141-C67E) of Big5 (Chinese traditional))
code point in charset: 0x2172
script: symbol
syntax: _ which means: symbol
category: .:Base, c:Chinese, h:Korean, j:Japanese
to input: type "C-x 8 RET 25cb" or "C-x 8 RET WHITE CIRCLE"
buffer code: #xE2 #x97 #x8B
file code: #xE2 #x97 #x8B (encoded by coding system utf-8-unix)
display: by this font (glyph code)
x:-eten-fixed-medium-r-normal--16-150-75-75-c-160-big5.eten-0 (#xA1B3)

Character code properties: customize what to show
name: WHITE CIRCLE
general-category: So (Symbol, Other)
decomposition: (9675) ('○')

Why can't it say U+25CB WHITE CIRCLE
anywhere (except fragmented all over the place)?

If you ask for the reason, I guess it's just legacy. I assume `describe-char' is much older than the Unicode support in Emacs, and it was never thoroughly redesigned (the Unicode properties are all at the bottom).

My suggestion would be to replace the "(codepoint ...)" part with the standard code point description "(U+NNNN character name)", and either get rid of most of the non-Unicode properties (preferred charset, code point in charset, buffer code, file code) or move them further down. That should be a relatively simple change in the code of `describe-char'.

From:	Philipp Stephani
Subject:	bug#26477: what-cursor-position should mention "U+"
Date:	Thu, 20 Apr 2017 10:12:55 +0000