emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Display of characters #xa0 and #xad in unibyte buffers


From: Kenichi Handa
Subject: Re: Display of characters #xa0 and #xad in unibyte buffers
Date: Mon, 28 Sep 2009 10:10:32 +0900

In article <address@hidden>, Eli Zaretskii <address@hidden> writes:

> > >> $ emacs -Q
> > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > >> 
> > >> The characters are displayed as "_-" (approximately).
> > >> 
> > >> Shouldn't they be displayed as "\240\255", considering that these are
> > >> raw bytes with no specific meaning?
> > 
> > > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> > > interpreted as a character, and shown as such.  This is the main
> > > feature of unibyte buffers; otherwise, who'd want them?

I think the main feature of unibyte buffers is to handle
raw-bytes as is.  For those who want to see a raw-byte as a
character of their locale (language environment), we have
unibyte-display-via-language-environment.

> > Different question then: Why are all other characters in the range from
> > #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad
> > being the only exceptions?

> I don't know, but it sounds like a bug.  Or maybe what I wrote above
> is just my pipe dream, not the reality.

> Handa-san, can you please comment on this?

The code for handling nobreak-char-display in
get_next_display_element should pay attention to
unibyte-display-via-language-environment.  I've just
installed the attached change.

In article <address@hidden>, Stefan Monnier <address@hidden> writes:

> The patch below should help.
[...]
> --- xdisp.c.~1.1301.~ 2009-09-20 13:01:24.000000000 -0400
> +++ xdisp.c   2009-09-25 10:02:08.000000000 -0400
> @@ -5794,7 +5794,8 @@
>             /* Handle non-break space in the mode where it only gets
>                highlighting.  */
 
> -           if (EQ (Vnobreak_char_display, Qt)
> +           if ((it->multibyte_p || unibyte_display_via_language_environment)
> +               && EQ (Vnobreak_char_display, Qt)
>                 && it->c == 0xA0)

If unibyte_display_via_language_environment is nonzero, we
must compare DECODE_CHAR (unibyte, it->c) against 0xA0.
Otherwise, for instance in KOI8 locale, we wrongly display
some box-drawing character in KOI8 charset.

---
Kenichi Handa
address@hidden

Index: xdisp.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/xdisp.c,v
retrieving revision 1.1304
diff -u -r1.1304 xdisp.c
--- xdisp.c     27 Sep 2009 19:11:13 -0000      1.1304
+++ xdisp.c     28 Sep 2009 01:03:40 -0000
@@ -5684,6 +5684,10 @@
        {
          Lisp_Object dv;
          struct charset *unibyte = CHARSET_FROM_ID (charset_unibyte);
+         int nbsp_or_shy = 0; /* 1:NO-BREAK SPACE, 2:SOFT HYPHEN, 0:ELSE */
+#define IS_NBSP (nbsp_or_shy == 1)
+#define IS_SHY (nbsp_or_shy == 2)
+         int decoded = it->c;
 
          if (it->dp
              && (dv = DISP_CHAR_VECTOR (it->dp, it->c),
@@ -5712,6 +5716,18 @@
              goto get_next;
            }
 
+         if (unibyte_display_via_language_environment
+             && it->c >= 0x80)
+           decoded = DECODE_CHAR (unibyte, it->c);
+
+         if (it->c >= 0x80 && ! NILP (Vnobreak_char_display))
+           {
+             if (it->multibyte_p)
+               nbsp_or_shy = it->c == 0xA0 ? 1 : it->c == 0xAD ? 2 : 0;
+             else if (unibyte_display_via_language_environment)
+               nbsp_or_shy = decoded == 0xA0 ? 1 : decoded == 0xAD ? 2 : 0;
+           }
+
          /* Translate control characters into `\003' or `^C' form.
             Control characters coming from a display table entry are
             currently not translated because we use IT->dpvec to hold
@@ -5724,21 +5740,19 @@
             If it->multibyte_p is zero, eight-bit characters that
             don't have corresponding multibyte char code are also
             translated to octal form.  */
-         else if ((it->c < ' '
-                   ? (it->area != TEXT_AREA
-                      /* In mode line, treat \n, \t like other crl chars.  */
-                      || (it->c != '\t'
-                          && it->glyph_row
-                          && (it->glyph_row->mode_line_p || 
it->avoid_cursor_p))
-                      || (it->c != '\n' && it->c != '\t'))
-                   : (it->multibyte_p
-                      ? (!CHAR_PRINTABLE_P (it->c)
-                         || (!NILP (Vnobreak_char_display)
-                             && (it->c == 0xA0 /* NO-BREAK SPACE */
-                                 || it->c == 0xAD /* SOFT HYPHEN */)))
-                      : (it->c >= 127
-                         && (! unibyte_display_via_language_environment
-                             || (DECODE_CHAR (unibyte, it->c) <= 0xA0))))))
+         if ((it->c < ' '
+              ? (it->area != TEXT_AREA
+                 /* In mode line, treat \n, \t like other crl chars.  */
+                 || (it->c != '\t'
+                     && it->glyph_row
+                     && (it->glyph_row->mode_line_p || it->avoid_cursor_p))
+                 || (it->c != '\n' && it->c != '\t'))
+              : (nbsp_or_shy
+                 || (it->multibyte_p
+                     ? ! CHAR_PRINTABLE_P (it->c)
+                     : (! unibyte_display_via_language_environment
+                        ? it->c >= 0x80
+                        : (decoded >= 0x80 && decoded < 0xA0))))))
            {
              /* IT->c is a control character which must be displayed
                 either as '\003' or as `^C' where the '\\' and '^'
@@ -5794,7 +5808,7 @@
                 highlighting.  */
 
              if (EQ (Vnobreak_char_display, Qt)
-                 && it->c == 0xA0)
+                 && IS_NBSP)
                {
                  /* Merge the no-break-space face into the current face.  */
                  face_id = merge_faces (it->f, Qnobreak_space, 0,
@@ -5844,7 +5858,7 @@
                 highlighting.  */
 
              if (EQ (Vnobreak_char_display, Qt)
-                 && it->c == 0xAD)
+                 && IS_SHY)
                {
                  it->c = '-';
                  XSETINT (it->ctl_chars[0], '-');
@@ -5855,10 +5869,10 @@
              /* Handle non-break space and soft hyphen
                 with the escape glyph.  */
 
-             if (it->c == 0xA0 || it->c == 0xAD)
+             if (nbsp_or_shy)
                {
                  XSETINT (it->ctl_chars[0], escape_glyph);
-                 it->c = (it->c == 0xA0 ? ' ' : '-');
+                 it->c = (IS_NBSP ? ' ' : '-');
                  XSETINT (it->ctl_chars[1], it->c);
                  ctl_len = 2;
                  goto display_control;




reply via email to

[Prev in Thread] Current Thread [Next in Thread]