[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] different results from same patterns
From: |
Norihiro Tanaka |
Subject: |
[bug-gawk] different results from same patterns |
Date: |
Sun, 24 Jul 2016 19:01:33 +0900 |
Hi,
\323, \362 and \363 mean each following in el_GR.iso88597.
\323 (0xd3) SIGMA
\362 (0xf2) stigma
\363 (0xf3) sigma
This locale is MB_CUR_MAX == 1. \323 is upper case, OTOH \362 and \363
are lower case in the locale. Lower case of \323 is \363, and upper
case of \362 and \363 is \323.
Now I tested below.
(
LC_ALL=el_GR.iso88597
export LC_ALL
printf 'b\323\nb\362\nb\363\n' >in
pat=$(printf '\323\n'); ./gawk "BEGIN { IGNORECASE = 1 } /b$pat/ { print }" in
| sed -e s'/^b//' >out1-dfa
pat=$(printf '\362\n'); ./gawk "BEGIN { IGNORECASE = 1 } /b$pat/ { print }" in
| sed -e s'/^b//' >out2-dfa
pat=$(printf '\363\n'); ./gawk "BEGIN { IGNORECASE = 1 } /b$pat/ { print }" in
| sed -e s'/^b//' >out3-dfa
pat=$(printf '\323\n'); ./gawk 'BEGIN { IGNORECASE = 1 } /[a-c]'"$pat/ { print
}" in | sed -e s'/^b//' >out1-regex
pat=$(printf '\362\n'); ./gawk 'BEGIN { IGNORECASE = 1 } /[a-c]'"$pat/ { print
}" in | sed -e s'/^b//' >out2-regex
pat=$(printf '\363\n'); ./gawk 'BEGIN { IGNORECASE = 1 } /[a-c]'"$pat/ { print
}" in | sed -e s'/^b//' >out3-regex
for out in out[1-3]-*; do echo "$out"; od -tx1 "$out" | head -1; done
)
result:
out1-dfa
0000000 d3 0a f2 0a f3 0a
out1-regex
0000000 d3 0a
out2-dfa
0000000 d3 0a f2 0a f3 0a
out2-regex
0000000 f2 0a f3 0a
out3-dfa
0000000 d3 0a f2 0a f3 0a
out3-regex
0000000 f2 0a f3 0a
I expect out1-dfa and are same as out1-regex, out2-dfa and are same as
out2-regex and out3-dfa and are same as out3-regex.
I reported issue of fastmap in regex at
https://sourceware.org/bugzilla/show_bug.cgi?id=20381, but I think that
this is anothor issue. Although Gawk uses RE_ICASE with dfa matcher,
uses CASETABLE instead of RE_ICASE with regex in single byte locales.
I propose a patch, but I do not have the confidence that it is correct.
Thanks,
Norihiro
0001-fix-for-different-results-from-same-patterns.patch
Description: Text document
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug-gawk] different results from same patterns,
Norihiro Tanaka <=