[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug#387704: grep: -i breaks \W in some locales (perhaps UTF-8 locale
From: |
Aníbal Monsalve Salazar |
Subject: |
Re: Bug#387704: grep: -i breaks \W in some locales (perhaps UTF-8 locales only) |
Date: |
Sun, 29 Mar 2009 16:46:22 +1100 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
forwarded 387704 address@hidden
thanks
On Sun, Jan 11, 2009 at 11:14:07AM -0500, Ruben Molina wrote:
>On Sat, Sep 16, 2006 at 10:35:26AM +0200, Christoph Biedl wrote:
>>Package: grep
>>Version: 2.5.1.ds2-5
>>Severity: normal
>>
>>I noticed that enabling --ignore-case suddenly caused certain patterns
>>not to match any longer although they should:
>>
>>$ echo 'foo bar' | grep '^foo\W'
>>foo bar
>>$ echo 'foo bar' | grep -i '^foo\W'
>>$
>>
>>Digging further reveals that there's an locales influence since
>>$ echo 'foo bar' | LANG=C grep -i '^foo\W'
>>foo bar
>>$
>>
>>matches again. After a check using all my generated locales:
>>
>>MATCH:
>>- de_DE
>>- address@hidden
>>- en_US
>>
>>FAIL:
>>- de_DE.UTF-8
>>- address@hidden
>>- en_US.UTF-8
>>
>>there's a strong impression that UTF-8 locales somehow disturb \W when
>>using -i.
>>
>>Even more confusing, using the bracket expression instead of the
>>synonym matches again:
>>$ echo 'foo bar' | LANG=de_DE.UTF-8 grep -i '^foo[^[:alnum:]]'
>>foo bar
>>$
>>
>>For the records, this sounds somewhat similar to #209194 and #218873
>>but these bugs are fixed in this version (2.5.1.ds2-5), I've checked.
>>
>>By the way, there's a typo in the manpage
>>
>> and
>> .B \eW
>> is a synonym for
>>- .BR [^[:alnum]] .
>>+ .BR [^[:alnum:]] .
>> .PP
>>
>>-- System Information:
>>Debian Release: testing/unstable
>> APT prefers testing
>> APT policy: (500, 'testing')
>>Architecture: i386 (i686)
>>Shell: /bin/sh linked to /bin/bash
>>Kernel: Linux 2.6.17.13
>>Locale: address@hidden, address@hidden
>>(charmap=UTF-8)
>>
>>Versions of packages grep depends on:
>>ii libc6 2.3.6.ds1-4 GNU C Library: Shared
>>libraries
>>
>>grep recommends no packages.
>>
>>-- no debconf information
>
>tags 387704 + confirmed
>found 387704 2.5.3~dfsg-6
>thanks
>
>$ locale
>LANG=es_CO.UTF-8
>LC_CTYPE="es_CO.UTF-8"
>LC_NUMERIC="es_CO.UTF-8"
>LC_TIME="es_CO.UTF-8"
>LC_COLLATE="es_CO.UTF-8"
>LC_MONETARY="es_CO.UTF-8"
>LC_MESSAGES="es_CO.UTF-8"
>LC_PAPER="es_CO.UTF-8"
>LC_NAME="es_CO.UTF-8"
>LC_ADDRESS="es_CO.UTF-8"
>LC_TELEPHONE="es_CO.UTF-8"
>LC_MEASUREMENT="es_CO.UTF-8"
>LC_IDENTIFICATION="es_CO.UTF-8"
>LC_ALL=
>
>$ echo 'foo bar' | grep '^foo\W'
>foo bar
>$
>
>$ echo 'foo bar' | grep -i '^foo\W'
>$
>
>$ echo 'foo bar' | LANG=C grep -i '^foo\W'
>foo bar
>$
I can reproduce this bug with 2.5.4
grep -V
GNU grep 2.5.4
echo 'foo bar' | grep '^foo\W'; echo $?
foo bar
0
echo 'foo bar' | grep -i '^foo\W'; echo $?
foo bar
0
echo 'foo bar' | LANG=C grep -i '^foo\W'; echo $?
foo bar
0
echo 'foo bar' | LANG=en_AU grep -i '^foo\W'; echo $?
foo bar
0
echo 'foo bar' | LANG=en_AU.UTF-8 grep -i '^foo\W'; echo $?
1
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: Bug#387704: grep: -i breaks \W in some locales (perhaps UTF-8 locales only),
Aníbal Monsalve Salazar <=