bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] different results from same patterns


From: Norihiro Tanaka
Subject: [bug-gawk] different results from same patterns
Date: Sun, 24 Jul 2016 19:01:33 +0900

Hi,

\323, \362 and \363 mean each following in el_GR.iso88597.

  \323 (0xd3) SIGMA
  \362 (0xf2) stigma
  \363 (0xf3) sigma

This locale is MB_CUR_MAX == 1.  \323 is upper case, OTOH \362 and \363
are lower case in the locale.  Lower case of \323 is \363, and upper
case of \362 and \363 is \323.

Now I tested below.

(
LC_ALL=el_GR.iso88597
export LC_ALL

printf 'b\323\nb\362\nb\363\n' >in

pat=$(printf '\323\n'); ./gawk "BEGIN { IGNORECASE = 1 } /b$pat/ { print }" in 
| sed -e s'/^b//' >out1-dfa
pat=$(printf '\362\n'); ./gawk "BEGIN { IGNORECASE = 1 } /b$pat/ { print }" in 
| sed -e s'/^b//' >out2-dfa
pat=$(printf '\363\n'); ./gawk "BEGIN { IGNORECASE = 1 } /b$pat/ { print }" in 
| sed -e s'/^b//' >out3-dfa
pat=$(printf '\323\n'); ./gawk 'BEGIN { IGNORECASE = 1 } /[a-c]'"$pat/ { print 
}" in | sed -e s'/^b//' >out1-regex
pat=$(printf '\362\n'); ./gawk 'BEGIN { IGNORECASE = 1 } /[a-c]'"$pat/ { print 
}" in | sed -e s'/^b//' >out2-regex
pat=$(printf '\363\n'); ./gawk 'BEGIN { IGNORECASE = 1 } /[a-c]'"$pat/ { print 
}" in | sed -e s'/^b//' >out3-regex

for out in out[1-3]-*; do echo "$out"; od -tx1 "$out" | head -1; done
)

result:

out1-dfa
0000000 d3 0a f2 0a f3 0a
out1-regex
0000000 d3 0a
out2-dfa
0000000 d3 0a f2 0a f3 0a
out2-regex
0000000 f2 0a f3 0a
out3-dfa
0000000 d3 0a f2 0a f3 0a
out3-regex
0000000 f2 0a f3 0a

I expect out1-dfa and are same as out1-regex, out2-dfa and are same as
out2-regex and out3-dfa and are same as out3-regex.

I reported issue of fastmap in regex at
https://sourceware.org/bugzilla/show_bug.cgi?id=20381, but I think that
this is anothor issue.  Although Gawk uses RE_ICASE with dfa matcher,
uses CASETABLE instead of RE_ICASE with regex in single byte locales.

I propose a patch, but I do not have the confidence that it is correct.

Thanks,
Norihiro

Attachment: 0001-fix-for-different-results-from-same-patterns.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]