[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: address@hidden: [patch] url-hexify-string does not follow W3C spec]
From: |
Stefan Monnier |
Subject: |
Re: address@hidden: [patch] url-hexify-string does not follow W3C spec] |
Date: |
Tue, 01 Aug 2006 10:32:05 -0400 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) |
>>>> What incompatibility? If the string only contains ASCII and
>>>> eight-bit-*, then encoding it with utf-8 will return the same string
>>>> of bytes (except in a unibyte string rather than multibyte string).
>>> Here's an example:
>>> (encode-coding-string "\x80" 'utf-8)
>>> => "\302\200"
>> Duh! Looks like a serious bug to me.
>> Handa-san, what's up with that?
> ??? \x80 == U+0080 is a valid Unicode character in "C1
> Controls" block.
Why was it chosen to represent U+0080 with \x80?
The problem with it is that it makes it impossible to reliably carry
byte-streams embedded in multibyte strings. Oh well, I guess that ecbdic
and friends also make it impossible anyway :-(
> However, I agree that the following is very questionable
> behaviour:
>>> (encode-coding-string (string-as-unibyte "\x80") 'utf-8)
>>> => "\302\200"
> But, that is a long standing problem, and should be fixed
> (if necessary) after the release.
It should be fixed by signalling an error: if the string is unibyte it's
already encoded.
Stefan
Re: address@hidden: [patch] url-hexify-string does not follow W3C spec], YAMAMOTO Mitsuharu, 2006/08/01