bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wrong behaviour of [:upper:] and/or [:lower:] if IGNORECASE is set and l


From: James Troup
Subject: wrong behaviour of [:upper:] and/or [:lower:] if IGNORECASE is set and locale's mb_cur_max > 1
Date: Fri, 26 Nov 2004 18:40:12 +0000
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Hi,

A debian developer, Fumitoshi UKAI <address@hidden>, reported[1]
the following bug in gawk:

| On all locales that mb_cur_max > 1, such as CJK or UTF-8 locales, 
| [:upper:] and/or [:lower:] don't work as expected if IGNORECASE is set.
| 
| For example,
|  % echo aaa | LANG=C gawk 'BEGIN { IGNORECASE=1 } /[[:upper:]]+/ { print }'
|  aaa            
|  # correct, a matches [:upper:] when IGNORECASE=1
| 
|  % echo aaa | LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE=1 } /[[:upper:]]+/ { 
print }'
|  %              
|  # wrong, a doesn't match [:upper:] when IGNORECASE=1
| 
| If GAWK_NO_DFA=1, it works fine as well as LANG=C.
| 
| As I checked the source code, I found this chunks in 
| regcomp.c:build_charclass()
| (the same code found in glibc)
| 
|   if ((syntax & RE_ICASE)
|       && (strcmp (class_name, "upper") == 0 || strcmp (class_name, "lower") 
== 0))
|     class_name = "alpha";
| 
| However, dfa.c doesn't do it the same way. So, this patch fixes this
| problem:
| 
| --- dfa.c.orig  2004-10-12 19:38:48.000000000 +0900
| +++ dfa.c       2004-10-12 19:38:11.000000000 +0900
| @@ -596,6 +596,9 @@
|                 {
|                   wctype_t wt;
|                   /* Query the character class as wctype_t.  */
| +                 if (case_fold && (strcmp(str, "upper") == 0 || strcmp(str, 
"lower") == 0)) {
| +                         strcpy(str, "alpha");
| +                 }
|                   wt = wctype (str);
|  
|                   if (ch_classes_al == 0)

-- 
James

[1] http://bugs.debian.org/276201




reply via email to

[Prev in Thread] Current Thread [Next in Thread]