grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grep-devel] handling of non-BMP characters


From: Bruno Haible
Subject: Re: [Grep-devel] handling of non-BMP characters
Date: Sun, 16 Dec 2018 20:27:58 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-139-generic; KDE/5.18.0; x86_64; ; )

Hi Jim,

> > Assaf Gordon wrote:
> > > "surrogate-pair" test fails on:
> > >     AIX 7.2
> >
> > It also fails on Cygwin (that is, on the platform for which this test was
> > initially introduced by Corinna Vinschen, in 2013).
> 
> Thanks.
> With that, I conclude it is time to disable this test, and have just
> done so with the following:
> https://git.savannah.gnu.org/cgit/grep.git/commit/?id=bdb98cec2e7bf255e1d00eaf8be16299f7bf571e

To me, that means to move a serious regression under the rug.

Recall what the test does: It creates a file 'in', whose contents is a single
(non-BMP) character, followed by a newline. Then it runs
   grep --file=in in
On glibc systems and more generally on systems where wchar_t is a 32-bit type,
this invocation prints the character and exits with code 0.
On Cygwin systems (and, in some conditions, also AIX systems), this
invocation prints nothing and exits with code 1.

To me, that is serious, because from the user point of view, characters should
not be handled differently depending on whether they are in the BMP or not.
(Recall that this is happening in a UTF-8 locale.)

It's a regression, because as I understand it from the commit logs, the test
must have succeeded on Cygwin right after Corinna Vinschen committed it.

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]