bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16048: 24.3.50; String compare surprise


From: Eli Zaretskii
Subject: bug#16048: 24.3.50; String compare surprise
Date: Wed, 04 Dec 2013 19:29:30 +0200

> From: Josh <josh@foxtail.org>
> Date: Wed, 4 Dec 2013 06:00:46 -0800
> Cc: Michael Albinus <michael.albinus@gmx.de>, 16048@debbugs.gnu.org
> 
> On Wed, Dec 4, 2013 at 5:07 AM, Andreas Schwab <schwab@linux-m68k.org>wrote:
> 
> > michael.albinus@gmx.de writes:
> >
> > > The following form evals to nil:
> > >
> > >   (string-equal "\377" "ΓΏ")
> >
> > "\377" is a unibyte string.  When converted to multibyte it yields
> > "\x3fffff".
> 
> 
> At least as of 24.3, the manual[0] suggests that such a conversion
> should not occur in this case:

And it doesn't occur, indeed:

  (multibyte-string-p "\377")

    => nil

>     You can also use hexadecimal escape sequences (`\xN') and octal
>     escape sequences (`\N') in string constants.  *But beware:* If a
>     string constant contains hexadecimal or octal escape sequences,
>     and these escape sequences all specify unibyte characters (i.e.,
>     less than 256), and there are no other literal non-ASCII
>     characters or Unicode-style escape sequences in the string, then
>     Emacs automatically assumes that it is a unibyte string.  That is
>     to say, it assumes that all non-ASCII characters occurring in the
>     string are 8-bit raw bytes.
> 
> [0] (info "(elisp) Non-ASCII in Strings")

Best citation contest? you're on!

   -- Function: string= string1 string2
       This function returns `t' if the characters of the two strings
       match exactly.  Symbols are also allowed as arguments, in which
       case the symbol names are used.  Case is always significant,
       regardless of `case-fold-search'.

   [...]

       For technical reasons, a unibyte and a multibyte string are
       `equal' if and only if they contain the same sequence of character
       codes and all these codes are either in the range 0 through 127
       (ASCII) or 160 through 255 (`eight-bit-graphic').  However, when a
       unibyte string is converted to a multibyte string, all characters
       with codes in the range 160 through 255 are converted to
       characters with higher codes, whereas ASCII characters remain
       unchanged.  Thus, a unibyte string and its conversion to multibyte
       are only `equal' if the string is all ASCII.

Note the last sentence.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]