bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9608: 24.0.50; Emacs lisp reader thinks no-break space is 0x08a0 (sh


From: David M. Cooke
Subject: bug#9608: 24.0.50; Emacs lisp reader thinks no-break space is 0x08a0 (should be 0x00a0)
Date: Mon, 26 Sep 2011 17:00:34 -0700

[zapped boilerplate header]

After reading through lread.c (I was writing an emacs lisp lexer for
syntax-highlighting in pygments), I discovered it treats the unicode
character U+08A0 as whitespace (with the comment "NBSP"). I believe this
was meant to be U+00A0 (NO-BREAK SPACE), as the code point U+08A0 has no
character assigned to it yet (it lies between the Samaritan and the
Devanagari blocks).

Additionally, you can see this by running the following lisp code:
(mapcar (lambda (sym) (string-as-unibyte (symbol-name sym) ))
        (read "(a b c\u00a0d e\u08a0f g \u00a0 h i \u08a0 j)"))

This gives the result
("a" "b" "c\302\240d" "e" "f" "g" "\302\240" "h" "i" "j")
where we can see U+00A0 (utf-8: "\302\240") is being treated as a
symbol-constituent character, whereas U+08A0 is whitespace.

The changes to the whitespace handling were introduced in bzr revision
78902 (on 2007-07-30, which is a few weeks after a discussion about
handling NO-BREAK SPACE on the mailing list). I'm guessing using 0x8a0
was just a thinko.

cheers,
David M. Cooke <david.m.cooke@gmail.com>


If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
    `bt full' and `xbacktrace'.
For information about debugging Emacs, please read the file
/Applications/_Editors/Emacs.app/Contents/Resources/etc/DEBUG.

In GNU Emacs 24.0.50.2 (x86_64-apple-darwin10.7.0, NS apple-appkit-1038.35)
of 2011-05-27 on mars.lan
Windowing system distributor `Apple', version 10.3.1138
configured using `configure  '--with-ns''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_CA.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
( s <backspace> m a p c a r SPC ' s y m b o l - n a 
m e SPC ( e v a l SPC " ( a SPC b SPC c \ u 0 0 a 0 
d SPC e \ u 0 8 a 0 d <backspace> f ) " ) ) C-j q <down-mouse-1> 
<mouse-1> # <down-mouse-1> <mouse-1> C-j q <down-mouse-1> 
<mouse-1> ' C-e C-j q <backspace> <left> <left> <left> 
<left> <left> <left> <left> <left> <left> <left> <left> 
<left> <left> <left> <left> <left> <left> <left> <left> 
<left> <left> <left> <left> <left> <left> <left> <backspace> 
<backspace> " <left> <left> <backspace> <backspace> 
<backspace> <backspace> r e a d C-e C-j <up> <left> 
<left> <left> <left> <left> SPC g SPC \ u 0 0 a 0 SPC 
h SPC i SPC \ u 0 8 a 0 SPC j C-e C-j <escape> x r 
e m p o r <backspace> <backspace> <backspace> <backspace> 
p o r <tab> <return>

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Entering debugger...
Back to top level.
Entering debugger...
Back to top level.
Entering debugger...
Back to top level.

Load-path shadows:
None found.

Features:
(shadow sort gnus-util time-date mail-extr message format-spec rfc822
mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mailabbrev mail-utils gmm-utils
mailheader emacsbug help-mode easymenu view debug tooltip ediff-hook
vc-hooks lisp-float-type mwheel ns-win tool-bar dnd fontset image fringe
lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer loaddefs button faces cus-face files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process dbusbind ns
multi-tty emacs)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]