emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: setenv -> locale-coding-system cannot handle ASCII?!


From: Kenichi Handa
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 14:32:16 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, "Stefan Monnier" <monnier+gnu/address@hidden> 
writes:
> I consider this context-dependent meaning of unibyte strings
> to be a problem.  I understand why text in a unibyte buffer
> has such an ambiguous meaning and agree that it's difficult
> to avoid, but it's not a reason to carry over this difficulty
> to strings where it is not needed.

Why is it not needed?  Strings and buffers are not that
different, both are containers of characters.  If we get a
unibyte string from a unibyte buffer by buffer-substring,
how should we treat that string?

>>  In the former case, as it is given to encode-coding-string,
>>  it is a multibyte form by which emacs represents
>>  character(s), not a sequence of characters representing raw
>>  bytes.

> The problem is that the multibyteness of strings is not
> always as easy to guess/control.

I agree.

> For example: what is the multibyteness of

>       (concat "\201" (format "%s" "hello"))
> and
>       (concat "\201" (format "%s" 1))

The latter yields multibyte, but I think it'a bug.  I found
that "(format "%s" 1)" is implemented by using
prin1-to-string, and prin1-to-string prints an object to a
temporary buffer and gets that buffer string.  So, in a
multibyte sesstion "(format "%s" 1)" yields a multibyte
string.  :-(

>>  In the latter case, as it is given to string-to-multibyte,
>>  it should be regard as a sequence of characters representing
>>  raw bytes, thus the result of (string-to-multibyte
>>  "\201\300") is still a sequence of raw-bytes.  Encoding
>>  raw-bytes should yield the same raw-bytes.

> Indeed, that's what I and `setenv' would want.

>>  And, this behaviour of encode-coding-string on a unibyte
>>  string is a natural consequence of encode-coding-region in a
>>  unibyte buffer.

> As mentioned above, I understand why it works that way in buffers,
> but I don't think it has to work the same way for strings.

So, do you mean that you want this?

    If a unibyte buffer has \201\300 in the region FROM and TO,

    (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
        => "\201\300"

    (encode-coding-region FROM TO 'iso-latin-1) changes the
    region to \300.

Isn't it more confusing?

By the way, I also really really hate this unibyte/mulitbyte
problem.  Sometimes I think I should have opposed to the
introduction of such a concept more strongly.

    imagine there's no unibyte 
    it's easy if you try
    no bytes below us
    above us only chars
    imagine all the people living in multibyte

:-)

---
Ken'ichi HANDA
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]