[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gawk 3.1.4: all character classes match "[" in utf-8 locales
From: |
James Troup |
Subject: |
gawk 3.1.4: all character classes match "[" in utf-8 locales |
Date: |
Fri, 26 Nov 2004 18:45:02 +0000 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux) |
Hi,
Last one... :-)
A debian developer, Fumitoshi UKAI <address@hidden>, reported[1]
the following bug in gawk 3.1.4:
| This is the same bug as Bug#274352 in grep.
|
| % echo '[' | LANG=en_US.UTF-8 gawk '/[[:space:]]/ { print }'
| [
| %
|
| This bug can be fixed by this patch
|
| At the beginning of processing [[ type of constructs, wc (= '[')
| is stored in wc1 for backtrack case. But if it processed normally,
| wc1 remains '[' and it will be put as normal character in work_mbc->chars[]
| This is why '[' will matches [[:space:]] or other character classes.
|
| I think wc1 should be set WEOF at the end of processing [[ type of
| constructs.
|
| --- dfa.c~ 2004-10-19 01:18:31.000000000 +0900
| +++ dfa.c 2004-10-19 02:53:28.000000000 +0900
| @@ -645,7 +645,7 @@
| work_mbc->coll_elems[work_mbc->ncoll_elems++] = elem;
| }
| }
| - wc = WEOF;
| + wc = wc1 = WEOF;
| }
| else
| /* We treat '[' as a normal character here. */
--
James
[1] http://bugs.debian.org/277122
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- gawk 3.1.4: all character classes match "[" in utf-8 locales,
James Troup <=