bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#387704: grep: -i breaks \W in some locales (perhaps UTF-8 locale


From: Aníbal Monsalve Salazar
Subject: Re: Bug#387704: grep: -i breaks \W in some locales (perhaps UTF-8 locales only)
Date: Sun, 29 Mar 2009 16:46:22 +1100
User-agent: Mutt/1.5.18 (2008-05-17)

forwarded 387704 address@hidden
thanks

On Sun, Jan 11, 2009 at 11:14:07AM -0500, Ruben Molina wrote:
>On Sat, Sep 16, 2006 at 10:35:26AM +0200, Christoph Biedl wrote:
>>Package: grep
>>Version: 2.5.1.ds2-5
>>Severity: normal
>>
>>I noticed that enabling --ignore-case suddenly caused certain patterns
>>not to match any longer although they should:
>>
>>$ echo 'foo bar' | grep    '^foo\W'
>>foo bar
>>$ echo 'foo bar' | grep -i '^foo\W'
>>$
>>
>>Digging further reveals that there's an locales influence since
>>$ echo 'foo bar' | LANG=C grep -i '^foo\W'
>>foo bar
>>$
>>
>>matches again. After a check using all my generated locales:
>>
>>MATCH:
>>- de_DE
>>- address@hidden
>>- en_US
>>
>>FAIL:
>>- de_DE.UTF-8
>>- address@hidden
>>- en_US.UTF-8
>>
>>there's a strong impression that UTF-8 locales somehow disturb \W when
>>using -i.
>>
>>Even more confusing, using the bracket expression instead of the
>>synonym matches again:
>>$ echo 'foo bar' | LANG=de_DE.UTF-8 grep -i '^foo[^[:alnum:]]'
>>foo bar
>>$
>>
>>For the records, this sounds somewhat similar to #209194 and #218873
>>but these bugs are fixed in this version (2.5.1.ds2-5), I've checked.
>>
>>By the way, there's a typo in the manpage
>>
>>  and
>>  .B \eW
>>  is a synonym for
>>- .BR [^[:alnum]] .
>>+ .BR [^[:alnum:]] .
>>  .PP
>>
>>-- System Information:
>>Debian Release: testing/unstable
>>  APT prefers testing
>>  APT policy: (500, 'testing')
>>Architecture: i386 (i686)
>>Shell:  /bin/sh linked to /bin/bash
>>Kernel: Linux 2.6.17.13
>>Locale: address@hidden, address@hidden
>>(charmap=UTF-8)
>>
>>Versions of packages grep depends on:
>>ii  libc6                        2.3.6.ds1-4 GNU C Library: Shared
>>libraries
>>
>>grep recommends no packages.
>>
>>-- no debconf information
>
>tags  387704  + confirmed
>found 387704 2.5.3~dfsg-6
>thanks
>
>$ locale
>LANG=es_CO.UTF-8
>LC_CTYPE="es_CO.UTF-8"
>LC_NUMERIC="es_CO.UTF-8"
>LC_TIME="es_CO.UTF-8"
>LC_COLLATE="es_CO.UTF-8"
>LC_MONETARY="es_CO.UTF-8"
>LC_MESSAGES="es_CO.UTF-8"
>LC_PAPER="es_CO.UTF-8"
>LC_NAME="es_CO.UTF-8"
>LC_ADDRESS="es_CO.UTF-8"
>LC_TELEPHONE="es_CO.UTF-8"
>LC_MEASUREMENT="es_CO.UTF-8"
>LC_IDENTIFICATION="es_CO.UTF-8"
>LC_ALL=
>
>$ echo 'foo bar' | grep    '^foo\W'
>foo bar
>$
>
>$ echo 'foo bar' | grep -i '^foo\W'
>$
>
>$ echo 'foo bar' | LANG=C grep -i '^foo\W'
>foo bar
>$

I can reproduce this bug with 2.5.4

grep -V
GNU grep 2.5.4

echo 'foo bar' | grep '^foo\W'; echo $?
foo bar
0

echo 'foo bar' | grep -i '^foo\W'; echo $?
foo bar
0

echo 'foo bar' | LANG=C grep -i '^foo\W'; echo $?
foo bar
0

echo 'foo bar' | LANG=en_AU grep -i '^foo\W'; echo $?
foo bar
0

echo 'foo bar' | LANG=en_AU.UTF-8 grep -i '^foo\W'; echo $?
1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]