[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fwd: Re: Inadequate documentation of silly characters on screen.
From: |
Alan Mackenzie |
Subject: |
Re: Fwd: Re: Inadequate documentation of silly characters on screen. |
Date: |
Thu, 19 Nov 2009 12:21:19 +0000 |
User-agent: |
Mutt/1.5.9i |
Hi, Andreas,
On Thu, Nov 19, 2009 at 11:16:03AM +0100, Andreas Schwab wrote:
> Alan Mackenzie <address@hidden> writes:
> > So my `aset' invocation is trying to write a multibyte ?ñ into a
> > unibyte ?\n, and gets truncated from #x8f1 to #xf1 in the process.
> Nothing gets truncated. In Emacs 23 ?ñ is simply the number 241,
> whereas in Emacs 22 is it the number 2289. You can put 2289 in a string
> in Emacs 23, but there is no defined unicode character with that value.
Ah, thanks! So when I do
M-: (setq nl "\n")
M-: (aset nl 0 ?ñ)
M-: (insert nl)
, after the `aset', the string nl correctly contains, one character which
is the single byte #xf1. The bug happens in `insert', where something is
interpreting the byte #xf1 as the signed integer #xfffff.....ffff1.
Delving into the bowels of Emacs, I find this in character.h:
1. #define STRING_CHAR_AND_LENGTH(p, len, actual_len) \
2. (!((p)[0] & 0x80) \
3. ? ((actual_len) = 1, (p)[0]) \
4. : ! ((p)[0] & 0x20) \
5. ? ((actual_len) = 2, \
6. (((((p)[0] & 0x1F) << 6) \
7. | ((p)[1] & 0x3F)) \
8. + (((unsigned char) (p)[0]) < 0xC2 ? 0x3FFF80 : 0))) \
9. : ! ((p)[0] & 0x10) \
10. ? ((actual_len) = 3, \
11. ((((p)[0] & 0x0F) << 12) \
12. | (((p)[1] & 0x3F) << 6) \
13. | ((p)[2] & 0x3F))) \
14. : string_char ((p), NULL, &actual_len))
#xf1 drops through all this nonsense to string_char (in character.c). It
drops through to this case:
else if (! (*p & 0x08))
{
c = ((((p)[0] & 0xF) << 18)
| (((p)[1] & 0x3F) << 12)
| (((p)[2] & 0x3F) << 6)
| ((p)[3] & 0x3F));
p += 4;
}
, where it obviously becomes silly. At least, I think that's where it
ends up. This isn't the most maintainable piece of code in Emacs.
So, if ISO-8559-1 characters are now represented as single bytes in
Emacs, what test for mutibyticity should STRING_CHAR_AND_LENGTH be using?
> Andreas.
--
Alan Mackenzie (Nuremberg, Germany).
- address@hidden: Re: Inadequate documentation of silly characters on screen.], Alan Mackenzie, 2009/11/18
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Stefan Monnier, 2009/11/18
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Alan Mackenzie, 2009/11/19
- Re: Inadequate documentation of silly characters on screen., Miles Bader, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Andreas Schwab, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen.,
Alan Mackenzie <=
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Jason Rumney, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Stefan Monnier, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Alan Mackenzie, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Jason Rumney, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Alan Mackenzie, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Eli Zaretskii, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Stefan Monnier, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Alan Mackenzie, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Andreas Schwab, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Aidan Kehoe, 2009/11/19