emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Re: Inadequate documentation of silly characters on screen.


From: Alan Mackenzie
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date: Thu, 19 Nov 2009 12:21:19 +0000
User-agent: Mutt/1.5.9i

Hi, Andreas,

On Thu, Nov 19, 2009 at 11:16:03AM +0100, Andreas Schwab wrote:
> Alan Mackenzie <address@hidden> writes:

> > So my `aset' invocation is trying to write a multibyte ?ñ into a
> > unibyte ?\n, and gets truncated from #x8f1 to #xf1 in the process.

> Nothing gets truncated.  In Emacs 23 ?ñ is simply the number 241,
> whereas in Emacs 22 is it the number 2289.  You can put 2289 in a string
> in Emacs 23, but there is no defined unicode character with that value.

Ah, thanks!  So when I do 

   M-: (setq nl "\n")
   M-: (aset nl 0 ?ñ)
   M-: (insert nl)

, after the `aset', the string nl correctly contains, one character which
is the single byte #xf1.  The bug happens in `insert', where something is
interpreting the byte #xf1 as the signed integer #xfffff.....ffff1.

Delving into the bowels of Emacs, I find this in character.h:

1.  #define STRING_CHAR_AND_LENGTH(p, len, actual_len)              \
2.    (!((p)[0] & 0x80)                                             \
3.     ? ((actual_len) = 1, (p)[0])                                 \
4.     : ! ((p)[0] & 0x20)                                          \
5.       ? ((actual_len) = 2,                                         \
6.          (((((p)[0] & 0x1F) << 6)                                  \
7.            | ((p)[1] & 0x3F))                                      \
8.           + (((unsigned char) (p)[0]) < 0xC2 ? 0x3FFF80 : 0)))     \
9.       : ! ((p)[0] & 0x10)                                          \
10.        ? ((actual_len) = 3,                                         \
11.           ((((p)[0] & 0x0F) << 12)                                  \
12.            | (((p)[1] & 0x3F) << 6)                                 \
13.            | ((p)[2] & 0x3F)))                                      \
14.        : string_char ((p), NULL, &actual_len))

#xf1 drops through all this nonsense to string_char (in character.c).  It
drops through to this case:

  else if (! (*p & 0x08))
    {
      c = ((((p)[0] & 0xF) << 18)
           | (((p)[1] & 0x3F) << 12)
           | (((p)[2] & 0x3F) << 6)
           | ((p)[3] & 0x3F));
      p += 4;
    }

, where it obviously becomes silly.  At least, I think that's where it
ends up.  This isn't the most maintainable piece of code in Emacs.

So, if ISO-8559-1 characters are now represented as single bytes in
Emacs, what test for mutibyticity should STRING_CHAR_AND_LENGTH be using?

> Andreas.

-- 
Alan Mackenzie (Nuremberg, Germany).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]