emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Re: Inadequate documentation of silly characters on screen.


From: David Kastrup
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date: Thu, 19 Nov 2009 23:16:24 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux)

Alan Mackenzie <address@hidden> writes:

> Hi, Eli!
>
> On Thu, Nov 19, 2009 at 09:52:20PM +0200, Eli Zaretskii wrote:
>> > Date: Thu, 19 Nov 2009 18:08:48 +0000
>> > From: Alan Mackenzie <address@hidden>
>> > Cc: address@hidden
>
>> > No, you (all of you) are missing the point.  That point is that if an
>> > Emacs Lisp hacker writes "?ñ", it should work, regardless of what
>> > "codepoint" it has, what "bytes" represent it, whether those "bytes"
>> > are coded with a different codepoint, or what have you.
>
>> No can do, as long as we support both unibyte and multibyte buffers
>> and strings.
>
> This seems to be the big thing.  That ?ñ has no unique meaning.

Wrong.  It means the character code of the character ñ in Emacs'
internal encoding.

> The current situation violates the description on the elisp page
> "Basic Char Syntax", which describes the situation as I understood it
> up until half an hour ago.

Hm?


    2.3.3.1 Basic Char Syntax
    .........................

    Since characters are really integers, the printed representation of
    a character is a decimal number.  This is also a possible read
    syntax for a character, but writing characters that way in Lisp
    programs is not clear programming.  You should _always_ use the
    special read syntax formats that Emacs Lisp provides for characters.
    These syntax formats start with a question mark.

This makes very very very clear that we are talking about an integer
here.

Not that the higher node does not also mention this:

    2.3.3 Character Type
    --------------------

    A "character" in Emacs Lisp is nothing more than an integer.  In
    other words, characters are represented by their character codes.
    For example, the character `A' is represented as the integer 65.

>> > OK.  Surely displaying it as "\361" is a bug?
>
>> If `a' can be represented as 97, then why cannot \361 be represented
>> as 4194289?
>
> ROFLMAO.  If this weren't true, you couldn't invent it.  ;-)

Since raw bytes above 127 are not legal utf-8 sequences and we want some
character representation for them, and since character codes 128 to 255
are already valid Unicode codepoints, the obvious solution is to use
numbers that aren't valid Unicode codepoints.  One could have chosen
-128 to -255 for example.  Except that we don't have a natural algorithm
for encoding those in a superset of utf-8.

>> > So, how did the character "ñ" get turned into the illegal byte
>> > #xf1?
>
>> It did so because you used aset to put it into a unibyte string.
>
> So, what should I have done to achieve the desired effect?  How should
> I modify "(aset nl 0 ?ü)" so that it does the Right Thing?

Using aset on strings is crude.  If it were up to me, I would not allow
this operation at all.

>> > Are you saying that Emacs is converting "?ñ" and "?ä" into the
>> > wrong integers?
>
>> Emacs can convert it into 2 distinct integer representations.  It
>> decides which one by the context.  And you just happened to give it
>> the wrong context.
>
> OK, I understand that now, thanks.

Too bad that it's wrong.  ?ñ is the integer that is Emacs' internal
character code for ñ.  A single integer representation, only different
on Emacsen with different internal character codes.  If you want to
produce an actual string from it, use char-to-string.

-- 
David Kastrup





reply via email to

[Prev in Thread] Current Thread [Next in Thread]