emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Display of characters #xa0 and #xad in unibyte buffers


From: Kenichi Handa
Subject: Re: Display of characters #xa0 and #xad in unibyte buffers
Date: Mon, 28 Sep 2009 20:24:24 +0900

In article <address@hidden>, Eli Zaretskii <address@hidden> writes:

> > In article <address@hidden>, Eli Zaretskii <address@hidden> writes:
> > 
> > > > >> $ emacs -Q
> > > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > > > >> 
> > > > >> The characters are displayed as "_-" (approximately).
> > > > >> 
> > > > >> Shouldn't they be displayed as "\240\255", considering that these are
> > > > >> raw bytes with no specific meaning?
> > > > 
> > > > > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> > > > > interpreted as a character, and shown as such.  This is the main
> > > > > feature of unibyte buffers; otherwise, who'd want them?
> > 
> > I think the main feature of unibyte buffers is to handle
> > raw-bytes as is.

> How do we even know that they are raw bytes, and how do we
> distinguish, in a unibyte buffer, ΓΌ from \374, say?  Just because they
> were inserted by C-q NNN or by some other mechanism?

They are not distinguished.

> > For those who want to see a raw-byte as a character of their locale
> > (language environment), we have
> > unibyte-display-via-language-environment.

> I thought bytes in unibyte buffers are always interpreted as
> characters of the locale, as Emacs 19 did.

Not really because we don't perform automatic
unibyte<->multibyte decoding/encoding anymore.  So, if we
cut #xC0 in a unibyte buffer and yank it in a multibyte
buffer, eight-bit character is inserted instead of U+00C0.

> Are you saying that they
> are by default always interpreted as raw bytes, unless
> unibyte-display-via-language-environment is set?

unibyte-display-via-language-environment just controls how
to display them, and it doesn't affect how they are
interpreted.

Actually, the interpretation of characters in a unnibyte
buffer is still inconsistent.  For instance,
skip-syntax-forward treats #x80..#xFF as characters
U+0080..U+00FF.  Thus #xC0 is a word-constituent and #xD7 is
a symbol.  We must fix it somehow.  But, how?  We currently
don't have a suitable syntax code for eight-bit chars.

---
Kenichi Handa
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]