bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character ranges in regular expressions


From: Paolo Bonzini
Subject: Re: character ranges in regular expressions
Date: Tue, 05 Oct 2010 10:31:12 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100907 Fedora/3.1.3-1.fc13 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.3

On 10/04/2010 10:51 PM, Eric Blake wrote:
On 10/04/2010 02:43 PM, Aharon Robbins wrote:
Which is why my proposal is that glibc consider:

[A-Z] => match C locale; 26 letters, regardless of locale
[[.A.]-[.Z.]] => use collation rules, since we explicitly spelled things
with collation symbols (26 letters in POSIX local, 51 or even more in
other locales, since accented characters might be included in the
collation range), so that we aren't completely losing CEO behavior (if
someone seriously has a reason to use it)
[[:upper:]] => per POSIX rules in all locales

This would be great. In what must be close to (or more than) the
10 years since gawk started supporting locales, I have yet to meet
anyone who thinks that [a-z] matching [A-Y] is a feature!

Great idea or not, Uli rejected it :(

------- Additional Comments From drepper dot fsp at gmail dot com
2010-10-04 02:42 -------
This stays as it is. If individual locale maintainers think the current
behavior
is unintentionally as-is then they can change it. But in general this is
the
long-implemented behavior and won't be changed. Collating elements are
just not
really useful outside the POSIX locale or when the locale is guaranteed
to stay
the same.

No, Uli rejected a sweeping change of all locales implementing CEO as aAbBcCdD. He instead left it up to individual locale maintainers. Note that CEO in en_US locale is a-zA-Z, which is also different from ASCII.

He didn't say anything about implementing [a-z] using code points and [[.a.]-[.z.]] using CEO or whatever else. In principle it's even possible to modify localedef so that __collseq_table_lookup represents strcoll order rather than CEO.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]