[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug#577095: grep: bracket expressions fails depending on the locale
From: |
Jim Meyering |
Subject: |
Re: Bug#577095: grep: bracket expressions fails depending on the locale |
Date: |
Sat, 10 Apr 2010 09:42:28 +0200 |
Aníbal Monsalve Salazar wrote:
> I reproduced this bug, see below.
>
> grep --version
> GNU grep 2.6.3
>
> cat /tmp/a
> root:x:0:0:root:/root:/bin/bash
> anibal:x:1000:1000:Anibal Monsalve Salazar,,,:/home/anibal:/bin/bash
> Debian-exim:x:102:104::/var/spool/exim4:/bin/false
> ntp:x:106:108::/home/ntp:/bin/false
>
> grep -E '^[A-Z]' /tmp/a
> root:x:0:0:root:/root:/bin/bash
> Debian-exim:x:102:104::/var/spool/exim4:/bin/false
> ntp:x:106:108::/home/ntp:/bin/false
>
> grep -Ev '^[A-Z]' /tmp/a
> anibal:x:1000:1000:Anibal Monsalve Salazar,,,:/home/anibal:/bin/bash
Thanks for Cc'ing bug-grep, however this is not a bug in grep-2.6.3.
Rather, it demonstrates that grep-2.5.4-4 failed to honor your locale
settings.
As you noticed, what the [A-Z] range matches depends on your locale settings.
Run "locale" to print those settings.
In the C (aka POSIX) locale [A-Z] matches ASCII upper case ABC...Z,
but in many other locales it matches AbBbCc...Zz.
Demonstrate with this:
$ for i in a A b B c C; do \
printf "$i: "; echo $i | LC_ALL=en_US.UTF-8 grep -E '[A-Z]' || echo; done
a:
A: A
b: b
B: B
c: c
C: C
If you really want to match only the 26 ASCII upper case letters,
you can run grep in the C locale, even using that risky range notation:
$ echo b | LC_ALL=C grep '[A-Z]'
[Exit 1]
$
However, it's better to avoid the '[A-Z]' range notation and to
prefer the '[[:upper:]]' character class.
Using the [[:CLASS_NAME:]] notation is essential if you also
want to match other (non-ASCII) upper case characters in your locale:
$ echo É | LC_ALL=fr_FR.UTF-8 grep '[[:upper:]]'
É
Using range notation is often not what you want:
$ echo á | LC_ALL=fr_FR.UTF-8 grep '[A-F]'
á