[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #28275] Ranges like [a-z] incorrectly match in UTF systems
From: |
Norihirio Tanaka |
Subject: |
[bug #28275] Ranges like [a-z] incorrectly match in UTF systems |
Date: |
Thu, 17 Dec 2009 01:30:22 +0000 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 |
Follow-up Comment #6, bug #28275 (project grep):
The testcase gave me the following results.
% dd if=/dev/urandom bs=1024 count=1024
| iconv -c -f ucs-2 -t utf-8
| LANG=en_US.UTF8 grep -oha '[a-z]'
| hexdump -C
| sed -e 's/^[^ ]*//; s/|.*//; s/ 0a/
/g'
| sed -e 's/^ *//; s/ */ /g; /^$/d'
c5 a3
c5 b7
c5 ad
77
c4 81
c4 89
c5 9b
68
c2 aa
c5 b5
c4 a7
c3
a8
c5 a9
c3 a6
c4 b8
c3 ae
78
c4 ab
c3 a4
c3 a3
c5 9b
c3 bd
Though I don't know what character "c5 a3", "c5 b7", "c5 ad",
etc show, this behavior is designed in glibc.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?28275>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [bug #28275] Ranges like [a-z] incorrectly match in UTF systems, Makar, 2009/12/13
- [bug #28275] Ranges like [a-z] incorrectly match in UTF systems, Paolo Bonzini, 2009/12/14
- [bug #28275] Ranges like [a-z] incorrectly match in UTF systems, Makar, 2009/12/14
- [bug #28275] Ranges like [a-z] incorrectly match in UTF systems, Paolo Bonzini, 2009/12/14
- [bug #28275] Ranges like [a-z] incorrectly match in UTF systems, Makar, 2009/12/14
- [bug #28275] Ranges like [a-z] incorrectly match in UTF systems, Paolo Bonzini, 2009/12/14
- [bug #28275] Ranges like [a-z] incorrectly match in UTF systems,
Norihirio Tanaka <=
- [bug #28275] Ranges like [a-z] incorrectly match in UTF systems, Makar, 2009/12/21
- [bug #28275] grep -P should use PCRE_UTF8, Paolo Bonzini, 2009/12/22