help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Differences between identical strings in Emacs lisp


From: Jürgen Hartmann
Subject: Differences between identical strings in Emacs lisp
Date: Mon, 6 Apr 2015 15:21:46 +0200

What is the difference between the string represented by the constant "\xBA"
and the result of (concat '(#xBA))?

Background:

When I start Emacs 24.4 in Linux with the -Q option and the POSIX locale to
have clean conditions, i.e.

   LC_ALL=C emacs -Q

the evaluation of

   "\xBA"

in *scratch* (lisp interaction mode) yields a result
printed as "\272".

In contrast to that, the result of

   (concat '(#xBA))

is printed as "º", i.e. the "masculine ordinal indicator" glyph in double
quotes. The glyph's character is described by the command describe-char as
follows:

-----------------------------------------------------------------------------
             position: 235 of 341 (69%), column: 1
            character: º (displayed as º) (codepoint 186, #o272, #xba)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0xBA
               script: latin
               syntax: _     which means: symbol
             category: .:Base, L:Left-to-right (strong), h:Korean, j:Japanese, 
l:Latin
             to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
          buffer code: #xC2 #xBA
            file code: #xC2 #xBA (encoded by coding system nil)
              display: by this font (glyph code)
    xft:-unknown-DejaVu Sans 
Mono-normal-normal-normal-*-15-*-*-*-m-0-iso10646-1 (#x7C)

Character code properties: customize what to show
  name: MASCULINE ORDINAL INDICATOR
  general-category: Lo (Letter, Other)
  decomposition: (super 111) (super 'o')

There are text properties here:
  face                 font-lock-string-face
  fontified            t

[back]
-----------------------------------------------------------------------------

Obviously the result of (concat '(#xBA)) gets interpreted (decoded) on the
basis of the unicode charset, while "\xBA" is treated as a raw byte.

Comparing these strings directly also shows hat they are different:

   (string= "\xBA" (concat '(#xBA)))

evaluates to nil.

On the other hand, the expressions

   (append "\xBA" ())

and

   (append (concat '(#xBA)) ())

both evaluate to (186), indicating that the strings contain the same
character(s). So they are identical.

How to resolve this contradiction?

Since I could not find a clue in the manuals or via google, any explanation,
idea, hint, link is greatly appreciated.

Juergen

                                          


reply via email to

[Prev in Thread] Current Thread [Next in Thread]