emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (aset UNIBYTE-STRING MULTIBYTE-CHAR)


From: Harald Hanche-Olsen
Subject: Re: (aset UNIBYTE-STRING MULTIBYTE-CHAR)
Date: Thu, 15 May 2008 08:11:46 +0200 (CEST)

+ Stefan Monnier <address@hidden>:

> I just want to see more
> examples to better understand the context and try to figure out what's
> the right way to fix the problem.  Notice that in your example,
> 
>    (setq foo (make-string 4 ?a))
>    (aset foo 1 ?å)
>    (aset foo 1 ?€) ; => Error: args out of range
> 
> the problem comes from the fact that now that we use Unicode, ?å = 229.
> So this integer is also the code of a byte, which is why the first aset
> succeeds.

Right. Or perhaps more accurately, it is why the first aset succeeds
without automagically converting foo to a multibyte string.

> Maybe the better answer is for `make-string' to always create
> multibyte strings, just like `string' now does.

Hmm. Except it doesn't, quite:

(multibyte-string-p (string ?a ?b ?c ?d)) => nil
(multibyte-string-p (string ?a ?b ?c ?å)) => t

It seems to be the presence of non-ASCII that triggers the creation of
a multibyte string, even though in this case a unibyte string could
also hold the result. In fact, the current behaviours of string and
make-string are quite similar:

(multibyte-string-p (make-string 3 ?a)) => nil
(multibyte-string-p (make-string 3 ?å)) => t

> In any case if you stay far away from `aset on strings' your life will
> be generally better, the birds will sing and the sun will shine.

8) I am willing to believe that.

> >   The most basic way to alter the contents of an existing string is with
> >   `aset' (*note Array Functions::).  `(aset STRING IDX CHAR)' stores CHAR
> >   into STRING at index IDX.  Each character occupies one or more bytes,
> >   and if CHAR needs a different number of bytes from the character
> >   already present at that index, `aset' signals an error.
> 
> > That last bit actually seems to be outdated: An error is not ALWAYS
> > signaled in the indicated situation, only sometimes.
> 
> I hope the text is correct, if not, please report it as a bug.

Okay. I'll run it past you here first, though, since my understanding
of multibyte strings is still patchy. This succeeds and returns "€a€":

(let ((str (make-string 3 ?€)))
  (aset str 1 ?a)
  str)

If I am not mistaken ?€ needs two bytes (or more?) while ?a needs one,
right? And since two (or more) is different from one, the above text
claims that aset signals an error? Or is my understanding wrong? There
is code in aset to shuffle the contents of a multibyte strings around
in case of a size mismatch, however:

      if (prev_bytes != new_bytes)
        {
          /* We must relocate the string data.  */

> > (defun mew-addrstr-parse-syntax-list (str sep addrp &optional depth 
> > allow-spc)
> >   (when str
> >     (let* ((i 0) (len (length str))
> >        (par-cnt 0) (tmp-cnt 0) (sep-cnt 0)
> >        (tmp (mew-make-string len))
> >        c ret prevc)
> >       (catch 'max
> >     (while (< i len)
> >       (setq c (aref str i)) ; <= problem occurs here
> >       ... deleted ...)))))
> 
> Hmm... I don't see any `aset'.

Rats. Not enough caffeine, too much work. The deleted code is a big
(cond ...), about 80 lines long, that I didn't want to burden the list
with (it performs parsing after all). I assure you that it contains
(aset tmp tmp-cnt c) in multiple places.

It could have achieved the same result by consing up a list of the
characters and using (string (nreverse char-list)), or perhaps by
appending chars to a temporary buffer, but it didn't.

- Harald




reply via email to

[Prev in Thread] Current Thread [Next in Thread]