bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-c

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-c

From:	Eli Zaretskii
Subject:	bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display
Date:	Sat, 03 Nov 2012 23:13:40 +0200

> From: "Drew Adams" <drew.adams@oracle.com>
> Date: Sat, 3 Nov 2012 12:01:29 -0700
> Cc: 12054@debbugs.gnu.org
> 
> I think I understand this (but I might be misunderstanding).  The \240 in the
> 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as
> the char I wanted.

Yes.

> That is, the literal string in my code is read as a string that contains only 
> a
> single raw byte of octal 240 in place of the 4 chars \240 (and instead of as a
> string with the multibyte char no-break space).  Is that right?

Yes.

> And putting that together with Eli's statement about insertion ("'insert' 
> treats
> strings such as "\nnn" as unibyte strings"), I understand that the buffer text
> after I type `C-q 240' contains a unibyte raw byte, and not the multibyte char
> no-break space.

No.  It contains the NBSP.  Try it.  C-q inserts a multibyte
character, unlike '(insert "\240")', for example.

> But in that case I do not understand why `C-u C-x =' says that it _is_ the
> Unicode no-break space char.

Because it is.

> And I do not understand why Yidong's font-lock correction also shows
> that it is a no-break space char.

Chong didn't use "\240".

> So I'm confused about what is actually in the buffer.  From the doc and from
> Eli's statement, I gather that there is a unibyte raw byte (octal 240) at that
> position.  But `C-u C-x =' and font-lock seem to tell me that there is a
> (multibyte) no-break space char there.

Try '(insert "\240")' and then "C-x =" will show a unibyte byte.

> > (One reason for doing this is to allow unibyte strings to
> > be specified using string constants in Emacs Lisp source code.)
> 
> I can see how that can be useful.  But I can also see how it would be useful 
> to
> have some way of using octal syntax to match multibyte chars.  Isn't there 
> some
> reasonable way to allow for both?

Maybe, but we didn't find one, at least not one that would be
backward-compatible.

> Is there, for example, (or could there be added) a function that one can apply
> to the unibyte string for \240 that would convert it to a string that DTRT wrt
> multibyte?

Such functions do exist, see the "Converting Representations" node in
the ELisp manual.

> (decode-coding-string "\302\240" 'utf-8)
> 
> That allows use of only octal syntax - good.  But it still doesn't solve the
> problem for older Emacs versions - they raise the error (coding-system-error
> utf-8).

You don't want this, because even if you succeed in producing a NBSP
in Emacs 22 and older, the result will not match NBSP in other
charsets.  It's simply impossible with those versions of Emacs.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display, (continued)
- bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display, Andreas Schwab, 2012/11/03
  - bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display, Drew Adams, 2012/11/03

Prev by Date: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display
Next by Date: bug#12795: 24.2.50; ibuffers mark commands are quirky
Previous by thread: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display
Next by thread: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display
Index(es):
- Date
- Thread