bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gawk 3.1.4: all character classes match "[" in utf-8 locales


From: James Troup
Subject: gawk 3.1.4: all character classes match "[" in utf-8 locales
Date: Fri, 26 Nov 2004 18:45:02 +0000
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Hi,

Last one... :-)

A debian developer, Fumitoshi UKAI <address@hidden>, reported[1]
the following bug in gawk 3.1.4:

| This is the same bug as Bug#274352 in grep.
| 
|  % echo '[' | LANG=en_US.UTF-8 gawk '/[[:space:]]/ { print }'
|  [
|  %
| 
| This bug can be fixed by this patch
| 
|   At the beginning of processing [[ type of constructs, wc (= '[')
|   is stored in wc1 for backtrack case.  But if it processed normally,
|   wc1 remains '[' and it will be put as normal character in work_mbc->chars[]
|   This is why '[' will matches [[:space:]] or other character classes.
| 
|   I think wc1 should be set WEOF at the end of processing [[ type of
|   constructs.
| 
| --- dfa.c~      2004-10-19 01:18:31.000000000 +0900
| +++ dfa.c       2004-10-19 02:53:28.000000000 +0900
| @@ -645,7 +645,7 @@
|                       work_mbc->coll_elems[work_mbc->ncoll_elems++] = elem;
|                     }
|                 }
| -             wc = WEOF;
| +             wc = wc1 = WEOF;
|             }
|           else
|             /* We treat '[' as a normal character here.  */

-- 
James

[1] http://bugs.debian.org/277122




reply via email to

[Prev in Thread] Current Thread [Next in Thread]