emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: non-breaking hyphens


From: Chong Yidong
Subject: Re: non-breaking hyphens
Date: Mon, 17 Oct 2011 23:39:35 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.90 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>>      Some character sets define "no-break" versions of the space and
>>   hyphen characters, which are used where a line should not be broken.
>>   Emacs normally displays these characters with special faces
>>   (respectively, `nobreak-space' and `escape-glyph') to distinguish them
>>   from ordinary spaces and hyphens.
>
> The manual is referring to #xAD

Ah, I see, so the manual is confused, since U+00AD is a "soft hyphen"
not a no-break version of a hyphen.

> see this fragment from get_next_display_element (and the code
> thereafter which references nbsp_or_shy):
>
>         if (! ASCII_CHAR_P (c) && ! NILP (Vnobreak_char_display))
>           nbsp_or_shy = (c == 0xA0   ? char_is_nbsp
>                          : c == 0xAD ? char_is_soft_hyphen
>                          :             char_is_other);
>
> Based on this, I'd say that the implementation is incomplete: it only
> supports a subset of no-break characters defined by the Unicode
> standard.
>
> Note that the no-break characters should be displayed with the
> `nobreak-space' face, not `escape-glyph' face.  I think that the
> manual should also mention the nobreak-char-display variable.

I'm not sure I understand the goal of `nobreak-char-display'.  Is it for
warning the user when there is an ASCII look-alike character that isn't
really ASCII?  I guess that's mainly to avoid issues with source code?

If so, handling only U+A0 and U+AD would be incomplete, as you say.
Also, the name and documentation of `nobreak-char-display' is misleading
or incomplete---it shouldn't be limited to non-breaking or shy
characters, since there are many other non-ASCII lookalikes like U+2010
(the "true" hyphen), U+2002 (the "en space"), and U+2007 (the "figure
space").

I'm guessing the reason U+A0 and U+AD are treated specially is that
those characters happened to be in Latin-1 (i.e. hysterical raisins).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]