bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep 2.5.1: NUL byte doesn't match a complemented character class


From: Joe Wells
Subject: Re: grep 2.5.1: NUL byte doesn't match a complemented character class
Date: Thu, 23 Aug 2007 13:13:28 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)

Jim Meyering <address@hidden> writes:

> Joe Wells <address@hidden> wrote:
>> In grep 2.5.1, a NUL character doesn't match a complemented character
>> class.
> ...
>
> Works for me, using Debian's 2.5.1.ds2-6:
>
>   $ print '\0x'|grep '[^x]x'
>   Binary file (standard input) matches
>   $ grep '[^x]x'
>   address@hidden
>   Binary file (standard input) matches
>
> I get the same results with grep-2.5.1-57.fc7.

Hi, Jim,

Thanks very much for following my description of how to reproduce the
bug!  Your negative report is very helpful because it has inspired me
to investigate further.

I now can see what the difference between your environment and mine
must be.  I'm also using this environment variable setting:

  LC_CTYPE=en_US.UTF-8

When I change this (just for the “grep” process) to

  LC_CTYPE=C

the problem goes away.

I am confused by this.  My understanding of UTF-8 is that a single
octet with all bits off gets interpreted as character U+0000 (“NUL”
a.k.a. “NULL”), which is a perfectly valid Unicode character.

By the way, I have verified with “od -b” (your program!) exactly what
octets grep is seeing.

I suppose the problem might be in glibc?  Or perhaps there is a bug in
the locale data files?

(By the way, if any character in the input is being discarded for some
reason (e.g., invalid UTF-8 format), can I please ask that there
should be an error message generated by grep for this?  Otherwise
problems will be too difficult to track down.)

By the way, I am using Ubuntu 6.06 LTS (“Dapper Drake”) with all
libraries and utilities up to date.  Therefore, you can check at
<URL:http://packages.ubuntu.com/dapper/> to see what version of any
package I am running.

> [resending just to you, because your mail server blocked my first reply
>
>   <address@hidden>: host izanami.macs.hw.ac.uk[137.195.13.6] said: 550
>       82.230.74.64 is listed in rbl-plus.mail-abuse.ja.net (in reply to RCPT 
> TO
>       command)
>       ]

Sorry about that!  I don't know why that RBL lists that IP address
(mx.meyering.net).  I'm glad I got your second e-mail.

-- 
Joe




reply via email to

[Prev in Thread] Current Thread [Next in Thread]