|
From: | David De La Harpe Golden |
Subject: | Re: X11 Compound Text vs ISO 2022 |
Date: | Tue, 06 Jul 2010 21:18:58 +0100 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100620 Icedove/3.0.5 |
On 06/07/10 17:21, James Cloos wrote:
While testing my recently applied patch, I've discovered that Emacs will product ISO-2022 output for COMPOUND_TEXT which other libs and apps -- notably including libX11 -- cannot decode. As an example, (encode-coding-string "•" 'compound-text) ; U+2022 BULLET produces "^[$(address@hidden(B". '$(O' is ISO-IR 228¹, JIS X 2013:2000. But libX11 only knows about the $( charsets: 0, 1, A-D and G-M. A number of characters are output in '^[$-1'; such as: (encode-coding-string "ℜ" 'compound-text) ; U+211C BLACK-LETTER CAPITAL R "^[$-1\365\334^[-A" (encode-coding-string "ʻ" 'compound-text) ; U+02BB MODIFIER LETTER TURNED COMMA "^[$-1\244\333^[-A" That is encoded in mule-unicode-0100-24ff, essentially unknown outside Emacs. Other libs/apps prefer to use utf-8³ in compound_text for such chars.
Not really intimately familiar with the area [compound text seems to be a bit of a horror in these days of unicode...]
But anyway, if emacs isn't using one of the character sets listed in the table in sect. 4/5 of "the" spec [1] or utf-8 as per sect.7, presumably it's an emacs bug unless emacs has successfully "registered the encoding with the X consortium" as per sect. 6 (and I don't see that happening...).
Conversely, if emacs is sending a charset that IS listed in the tablein sect. 4/5 or utf-8 as per sect. 7, then libX11 and other apps are "at fault" if they don't recognise them.
[1] http://www.it.freebsd.org/pub/Unix/XFree86/WWW/htdocs/current/ctext.htmlBut err... the spec on freedesktop.org seems a lot older, not even mentioning utf-8 ???
[2] http://cgit.freedesktop.org/xorg/doc/xorg-docs/tree/specs/CTEXT/ctext.tbl.ms
[Prev in Thread] | Current Thread | [Next in Thread] |