emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue


From: Kenichi Handa
Subject: Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
Date: Mon, 27 Jan 2003 16:38:39 +0900 (JST)

In article <address@hidden>, "Stefan Monnier" <monnier+gnu/address@hidden> 
writes:
> I don't understand your question.  When people use string-FOO-multibyte
> it's generally because they don't understand what's going on and they
> think "a char is a char is a char and I don't get this multibyte madness":
> using decode-coding-string would force them to better understand what's
> going on.

But I suspect that such people won't use the correct coding
system anyway.  To use the correct coding system, they must
clearly understand what kind of multibyte string they want.
And if they understand that, there should be no difficulty
in using the correct string-FOO-multibyte function.

In one sense, it seems clean to use the concept of decoding
and encoding for all unibyte<->multibyte conversions
coherently.  But, that hides what Emacs actually does.

You wrote:
> I find it more helpful to think in terms of bytes and chars:

Definitely.  But,

> unibyte strings are sequences of bytes while multibyte
> strings are sequences of chars.

Unfortunately no.

Emacs can represent a character sequence both in unibyte and
multibyte string.  Emacs can also represent a raw-byte
sequence both in unibyte and multibyte string.  For a
multibyte string, which it represents (char-seq or byte-seq)
can be detected by what kind of characters it contains.
But, for a unibyte string, it's impossible, only the context
of how it is used decides that.

For string-make-multibyte, the input is a char-seq, and the
resulf of conversion is also a char-seq.  So, the concept of
decoding is not applicable here.

For string-to-multibyte, the input is a byte-seq, and the
result of conversion is also a byte-seq.  So, again, the
concept of decoding is not applicable neither.

For string-as-multibyte, the intput is a byte-seq, and the
result of conversion is a char-seq.  So, only here, the
concept of decoding is also applicable.

I hope this explains why I insist on string-FOO-multibyte
functions.

By the way, it may be good to instroduce coding system
aliases `internal' and `default', and write, for instance,
in the docstring of string-as-multibyte that the effect is
the same as (decode-coding-string UNIBYTE-STRING 'internal).

---
Ken'ichi HANDA
address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]