bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Erroneous assumption in isblank.c


From: Bruno Haible
Subject: Re: Erroneous assumption in isblank.c
Date: Tue, 5 Oct 2010 11:17:38 +0200
User-agent: KMail/1.9.9

Hi,

John Darrington wrote:
> In lib/isblank.c I see the following:
> 
>  /* The "blank" characters are '\t', ' ',
>      U+1680, U+180E, U+2000..U+2006, U+2008..U+200A, U+205F, U+3000, and none
>      except the first two is present in a common 8-bit encoding.  Therefore
>      the substitute for other platforms is not more complicated than this.  */
>   return (c == ' ' || c == '\t');
> 
> This is incorrect.  In iso-8859-1 (a very common 8-bit encoding), U+00A0 is 
> the
> non-breaking-space character.  

U+00A0 NO-BREAK SPACE is a glyph that carries no ink, but that is like a
non-blank punctuation character for other respects. In particular, its very
definition is that, unlike U+0020 SPACE, it is not an opportunity for line
breaking.

The function isblank() is not used in graphical rendering engines; it is used
in programs that do line breaking, such as 'fold':
coreutils/src/fold.c:178:                  if (isblank (to_uchar 
(line_out[logical_end])))
For this reason, isblank(U+00A0) *must* return false. Otherwise many programs
would treat is like U+0020 SPACE.

Bruno



reply via email to

[Prev in Thread] Current Thread [Next in Thread]