emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UCS-2BE


From: Juri Linkov
Subject: Re: UCS-2BE
Date: Fri, 01 Sep 2006 02:32:44 +0300
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/22.0.50 (gnu/linux)

> If UCS-2BE is a mislabel of UTF-16BE, UCS-2BE can simply be
> an alias of UTF16-BE.  If UCS-2BE is a BMP subset of
> UTF-16BE, UCS2-BE should be implemented differently from
> UTF-16BE

`UCS-2' is the fixed-length encoding of the BMP.  `UCS-2BE' is
a big-endian version of the UCS-2 encoding without using a BOM.
So as actually UCS-2 is a BMP subset of UTF-16, UCS-2BE is a BMP
subset of UTF-16BE (and UCS-2LE is a BMP subset of UTF-16LE).

The encodings `UCS-2' and `UCS-2BE' are implemented in iconv
(http://www.gnu.org/software/libiconv/), so you could look
at the implementation of UCS-2BE:

http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/ucs2be.h?revision=1.4&view=markup

Comparing it with the implementation of UTF-16BE, you can see that
UTF-16BE deals also with other planes:

http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/utf16be.h?revision=1.4&view=markup

And comparing UCS-2BE with the implementation of UCS-2, you can see that
UCS-2 also deals with a BOM:

http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/ucs2.h?revision=1.4&view=markup

There is one difference between outputting a BOM in the iconv
implementations of UCS-2 and UTF-16:

http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/utf16.h?revision=1.4&view=markup

i.e. converting a string to UTF-16 adds the BOM to the output, but
converting to UCS-2 doesn't add the BOM.

Does the Emacs implementation of UTF-16 output the BOM?

> (at least, we should not select it by select-safe-coding-system on
> saving a buffer that contains non-BMP characters).

What do you think is the right way to deal with non-BMP characters
when the user will try to save a UTF-16(BE) buffer in the UCS-2(BE)
encoding?

-- 
Juri Linkov
http://www.jurta.org/emacs/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]