emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs 23 character code space


From: Eli Zaretskii
Subject: Re: Emacs 23 character code space
Date: Sat, 01 Nov 2008 18:46:09 +0200

Another fragment from etc/NEWS that seems not entirely accurate:

    In buffers and strings, characters are represented by UTF-8 byte
    sequences in a multibyte buffer/string.

But UTF-8 defines 1- to 4-byte sequences to represent each Unicode
codepoint, whereas this comment from character.h:

    /* character code       1st byte   byte sequence
       --------------       --------   -------------
            0-7F            00..7F     0xxxxxxx
           80-7FF           C2..DF     110xxxxx 10xxxxxx
          800-FFFF          E0..EF     1110xxxx 10xxxxxx 10xxxxxx
        10000-1FFFFF        F0..F7     11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
       200000-3FFF7F        F8         11111000 1000xxxx 10xxxxxx 10xxxxxx 
10xxxxxx
       3FFF80-3FFFFF        C0..C1     1100000x 10xxxxxx (for eight-bit-char)
       400000-...           invalid

       invalid 1st byte     80..BF     10xxxxxx
                            F9..FF     11111xxx (xxx != 000)
    */

seems to tell that we use up to 5 bytes.

What am I missing?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]