emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs 23 character code space


From: Kenichi Handa
Subject: Re: Emacs 23 character code space
Date: Tue, 04 Nov 2008 16:35:10 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

In article <address@hidden>, Eli Zaretskii <address@hidden> writes:

> Thanks, this definitely helps.  Unfortunately, you worked from a
> non-current version of nonascii.texi; I already modified the first
> section heavily.  Please take a look when you can: I intend to
> downplay the unibyte stuff heavily, while the previous version gave
> unibyte and multibyte almost equal coverage.

Oops, I couldn't do "cvs update" until yesterday.  I've just
done it.

> In any case, I will certainly use what you wrote.  Thanks!

It seems that your last change is upto "@defun unibyte
string" (before @section Converting Text Representations).
May I leave the work of reflecting what I wrote to the
section before "Character Code" to you?  I'll continue to
fix the document after that section.

> > @acronym{ASCII} characters occupy one
> > byte, address@hidden characters occupy two to five bytes

> So I guess you agree that NEWS is not entirely correct saying that we
> use UTF-8 internally: UTF-8 uses only 1 to 4 bytes, not 1 to 5.
> Should I fix NEWS in this regard, saying that the internal
> representation is based on UTF-8, but extends it to handle additional
> characters?

As far as I remember, "UTF-8" was, at first, a general
mechanism to serialize a 31-bit unsigned value into byte
stream (thus upto 6-byte sequence).  But, at some point, it
seems that it is restricted to the Unicode coverage (thus
upto 4-byte sequence).  I think it's a regression.  Anyway,
yes, it is better to mention that Emacs uses extended utf-8
(or original utf-8).

---
Kenichi Handa
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]