emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MML charset tag regression


From: Kenichi Handa
Subject: Re: MML charset tag regression
Date: Mon, 28 Apr 2003 20:58:34 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, "James H. Cloos Jr." <address@hidden> writes:
>>>>>>  "Simon" == Simon Josefsson <address@hidden> writes:
Simon>  For me, when I yanked the string into emacs from galeon it
Simon>  becomes double-width.  It is single-width in galeon though.

> I also see that; any pasting of cyrillic text via pasting X's
> primary or from the clipboard.  The wide cyrillic is from the
> japanese-jisx0208 charset.
[...]

In article <address@hidden>, Simon Josefsson <address@hidden> writes:
> That may be interesting by itself.  Go to
> http://www.nns.ru/persons/gorbach.html using galeon (or mozilla, I
> think).  Cut'n'paste the first word and yank it in Emacs.  It looks as
> single-width in galeon, but when yanked into emacs it becomes double
> width. Yanking it into xterm or gnome-terminal doesn't change the
> string, it looks like single-width.  Save the HTML file and open it in
> emacs as a koi8 file (note that emacs doesn't auto detect it as koi8
> so you to do that manually), then it is single-width too.

> I guess it is the emacs X cut'n'paste code that somehow makes the
> string into double width japanese characters.

I don't think so.  There's no such code in Emacs that does
such a conversion.

I think galeon sends Emacs those cyrillic characters by
encoding into COMPOUND_TEXT as a charset of JISX0208.

Please try this:

At first, select a cyrillic text on galeon.  Then type this
in Emacs: C-x RET X raw-text RET C-y.  You'll see something
like this; "ESC $ ( B ...".

Next, try this:

At first, select a cyrillic text on galeon.  Then evalute
this in Emacs:
   (decode-coding-string (x-get-selecion 'PRIMARY 'UTF8_STRING) 'utf-8)
I think you'll see single width cyrillic chars (you have to
have a iso10646-1 font containing cyrillic glyphs).

The selection problem is very deep.  :-(

Ideally, the requester should be able to request of the type
'TEXT instead of the specific 'COMPOUND_TEXT or
'UTF8_STRING, and the requestee should return a text by one
of these appropriate types that can endocde the text;
STRING, COMPOUND_TEXT, or UTF8_STRING (in this priority
order).

But, unfortunetely, many X clients (requestee) don't behaves
like that.  If 'TEXT is requested, many returns just "?????"
even if the text can be correctly encoded by COMPOUND_TEXT
or UTF8_STRING.

So, it is necessary for Emacs to request by a specific type
'COMPOUND_TEXT ('UTF8_STRING has been recently introduced in
XFree86, and there are many clients that still doesn't
support it).

Recently, many gtk clients start supporting UTF8_STRING
without making COMPOUND_TEXT support better.  It may cause
no problem between gtk clients because they will request
only the type UTF8_STING.  But, it's a too shortsighted
manner.  :-(

The new encoding method using "Non-Standard Character Set
Encodings" of COMPOUND_TEXT makes the cyrillic case much
more complicated.  In some case (perhaps only in KOI8
locale), X clients recently start to encode cyrillic
characters in "ESC % / 0 ...".  They don't consider the
situation that the requester is running in a different
locale.  :-(

Perhaps, we should make Emacs to request UTF8_STRING at
first if the locale is UTF8, and if that request fails,
request COMPOUND_TEXT.

---
Ken'ichi HANDA
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]