[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug Report grep -E Debian Squeeze
From: |
Eric Blake |
Subject: |
Re: Bug Report grep -E Debian Squeeze |
Date: |
Mon, 25 Mar 2013 13:35:26 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4 |
On 03/25/2013 05:47 AM, Jean-Marc Messina wrote:
> Hi
>
> I hope i report this bug from the good way, if not, please accept my
> aplogies and ignore that mail as it's my first bug report.
>
> We have been facing a weird behaviour of "grep -E" on Debian Squeeze
> versions which seems not to happen in lenny or wheezy versions.
The behavior you are seeing is locale-dependent.
>
> Exemple :
>
> echo "tanZANIE" | grep -E '^[a-z]{2,20}$'
> No output (normal behaviour)
>
> echo "tanzANIE" | grep -E '^[a-z]{2,20}$'
> output : "tanzANIE"
You are probably running grep inside a locale that has case-insensitive
sorting, and thus where the range [a-z] actually expands to [aAbB...yYz]
(but not Z). For example, glibc's en_US.UTF-8 locale has that behavior.
POSIX says that the use of range operators in regular expressions is
undefined outside of the C locale, precisely because of this
rather-confusing historical behavior.
There is an effort underway to convert GNU tools to use Rational Range
Interpretation, where [a-z] will be forcefully translated to [abc...yz]
regardless of locale, even when libc would behave otherwise by default.
I'm not sure if that conversion has yet hit the version of grep that
you are using, but it may be part of the answer in the difference you
are seeing. The other thing to do is to check the output of 'locale'
between the machines that differ.
Meanwhile, the only PORTABLE way to get the behavior you want is to
avoid range expressions outside of the C locale, by either spelling out
the range:
echo "tanzANIE" | grep -E '^[abcdefghijklmnopqrstuvwxyz]{2,20}$'
or by forcing the locale:
echo "tanzANIE" | LC_ALL=C grep -E '^[a-z]{2,20}$'
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature