[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
wrong behaviour of [:upper:] and/or [:lower:] if IGNORECASE is set and l
From: |
James Troup |
Subject: |
wrong behaviour of [:upper:] and/or [:lower:] if IGNORECASE is set and locale's mb_cur_max > 1 |
Date: |
Fri, 26 Nov 2004 18:40:12 +0000 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux) |
Hi,
A debian developer, Fumitoshi UKAI <address@hidden>, reported[1]
the following bug in gawk:
| On all locales that mb_cur_max > 1, such as CJK or UTF-8 locales,
| [:upper:] and/or [:lower:] don't work as expected if IGNORECASE is set.
|
| For example,
| % echo aaa | LANG=C gawk 'BEGIN { IGNORECASE=1 } /[[:upper:]]+/ { print }'
| aaa
| # correct, a matches [:upper:] when IGNORECASE=1
|
| % echo aaa | LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE=1 } /[[:upper:]]+/ {
print }'
| %
| # wrong, a doesn't match [:upper:] when IGNORECASE=1
|
| If GAWK_NO_DFA=1, it works fine as well as LANG=C.
|
| As I checked the source code, I found this chunks in
| regcomp.c:build_charclass()
| (the same code found in glibc)
|
| if ((syntax & RE_ICASE)
| && (strcmp (class_name, "upper") == 0 || strcmp (class_name, "lower")
== 0))
| class_name = "alpha";
|
| However, dfa.c doesn't do it the same way. So, this patch fixes this
| problem:
|
| --- dfa.c.orig 2004-10-12 19:38:48.000000000 +0900
| +++ dfa.c 2004-10-12 19:38:11.000000000 +0900
| @@ -596,6 +596,9 @@
| {
| wctype_t wt;
| /* Query the character class as wctype_t. */
| + if (case_fold && (strcmp(str, "upper") == 0 || strcmp(str,
"lower") == 0)) {
| + strcpy(str, "alpha");
| + }
| wt = wctype (str);
|
| if (ch_classes_al == 0)
--
James
[1] http://bugs.debian.org/276201
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- wrong behaviour of [:upper:] and/or [:lower:] if IGNORECASE is set and locale's mb_cur_max > 1,
James Troup <=