bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep: obnoxious egrep vs i18n bug


From: Aharon Robbins
Subject: Re: grep: obnoxious egrep vs i18n bug
Date: Thu, 15 Jul 2004 12:43:05 +0300

Greetings.  Re this:

In article <address@hidden> you write:
>In egrep version 2.5.1, I'm seeing this behavior.  Note particularly the
>output of the third command:
>
>address@hidden echo ABC | egrep -i -e 'abc'
>ABC
>address@hidden echo ABC | egrep -i -e 'abc|xxx'
>ABC
>address@hidden echo ABC | egrep -i -e 'a[b]c|xxx'
>address@hidden echo ABC | egrep -i -e 'a[b]c'
>ABC
>address@hidden echo abc | egrep -i -e 'a[b]c|xxx'
>abc
>address@hidden echo $LANG
>en_US.UTF-8
>address@hidden echo ABC | LANG= egrep -i -e 'a[b]c|xxx'
>ABC
>
>I'm only slightly familiar with i18n, but I can't believe this output is
>correct.  I ran into this on a new RedHat machine, where $LANG is
>apparently being set by default.
>
>Here are a few particulars:
>address@hidden ldd $(type -p grep)
>        libpcre.so.0 => /lib/libpcre.so.0 (0x40024000)
>        libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
>        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
>address@hidden ls -l /lib/tls/libc.so.6
>lrwxrwxrwx    1 root     root           13 Nov 17  2003
>/lib/tls/libc.so.6 -> libc-2.3.2.so*
>address@hidden ls -l /lib/libpcre.so.0
>lrwxrwxrwx    1 root     root           16 Nov 17  2003
>/lib/libpcre.so.0 -> libpcre.so.0.0.1*
>address@hidden egrep --version
>egrep (GNU grep) 2.5.1
>
>Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
>This is free software; see the source for copying conditions. There is
>NO
>warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>PURPOSE.
>
>Mike
>
>Mike Coleman, Scientific Programmer, +1 816 926 4419
>Stowers Institute for Biomedical Research
>1000 E. 50th St., Kansas City, MO  64110

This is an interesting test case. It turns out that gawk, which shares
the dfa matcher, has the same problem.  The following patch seems to
fix the problem.  Your line numbers may vary.

Hope this helps.

Arnold Robbins
-----------------------
Thu Jul 15 12:36:25 2004   Arnold D. Robbins    <address@hidden>

        * dfa.c (parse_bracket_exp_mb):  If doing case folding,
        include the other case for regular characters inside [...].

--- dfa.c.save  2004-06-01 19:08:26.000000000 +0300
+++ dfa.c       2004-07-15 12:06:53.000000000 +0300
@@ -689,6 +689,19 @@
          REALLOC_IF_NECESSARY(work_mbc->chars, wchar_t, chars_al,
                               work_mbc->nchars + 1);
          work_mbc->chars[work_mbc->nchars++] = (wchar_t)wc;
+         if (case_fold)
+           {
+               wint_t altcase;
+
+               if (iswlower((wint_t) wc))
+                 altcase = towupper((wint_t) wc);
+               else if (iswupper((wint_t) wc))
+                 altcase = towlower((wint_t) wc);
+
+               REALLOC_IF_NECESSARY(work_mbc->chars, wchar_t, chars_al,
+                              work_mbc->nchars + 1);
+               work_mbc->chars[work_mbc->nchars++] = (wchar_t) altcase;
+           }
        }
     }
   while ((wc = wc1) != L']');
-- 
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd.     arnold AT skeeve DOT com
P.O. Box 354            Home Phone: +972  8 979-0381    Fax: +1 530 688 5518
Nof Ayalon              Cell Phone: +972 50  729-7545
D.N. Shimshon 99785     ISRAEL




reply via email to

[Prev in Thread] Current Thread [Next in Thread]