Re: Dealing with character ranges in grep

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dealing with character ranges in grep

From:	Paolo Bonzini
Subject:	Re: Dealing with character ranges in grep
Date:	Thu, 09 Jun 2011 12:41:16 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.10

On 06/09/2011 11:58 AM, Bruno Haible wrote:

Paolo,

[=e=] to match "e" as well as accented versions like é, è and ê).
That is the one feature that you get with glibc, and that you would
sacrifice when building --with-included-regex.


I agree.  It's up to distros to choose, of course.


If you are on the point of sacrificing a glibc feature in many programs,
then IMO you should first talk with the glibc people to see what alternative
they can offer.

No, I'm not! It's not any different from now. Right now, somedistros/people use --with-included-regex and get broken semantics + noequivalence classes; others use --without-included-regex and get anotherkind of broken semantics.

With my proposal, distros/people that use --with-included-regex wouldget understandable semantics + no equivalence classes; others will seeno change.


I don't plan to change the default between the two.

It is probably futile to ask Ulrich Drepper to change how [a-z] is interpreted
by default.

I think it would be possible to discuss it civilly with Uli (not onBugzilla though). Unfortunately, more glibc development now seems to bedone by someone I shall not name who sports twice the arrogance and halfthe knowledge/talent.

But what would gnulib need so as to implement our "desired"
behaviour? As far as I understand, you want to keep the interpretation of
[=e=] in the POSIX + glibc way, but change the interpretation of [a-z]?

That's a different story. If we could implement [=e=] in gnulib codeusing glibc extensions, I would be all for that. But even right now,using gnulib's regex means sacrificing [=e=]. So that's a separate topic.

The only possibility is that with this change more distros may be using--with-included-regex. That's their choice, not ours.

Then, what do we need from glibc?
   - Do we need a RE_RANGES_IGNORE_LOCALES flag, like Arnold proposed?

No, that would be really really bad to have, for the reasons I mentionedin my original email.

   - Do we need an API that allows us to access the collation elements?
     (Or is strcoll and wcscoll sufficient?)

No, they're not, and I thought about designing such an API last year,but in the end decided that locale behavior of regex are irremediablybroken. For example, when you have a collation element, you can matchit using ranges (e.g. [d-i] matches "ch" in Czech; "ch" collates after"h"), and even apply negation (e.g. [^c-h] matches "ch" too). Howeverthere is no way to anchor your match to the beginning of the collationelement. So "chci" matches both /[c-h]+ci/ and /[^c-h]+ci/. It isbeyond repair, and [=e=] is the only part that can be salvaged.


Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/09
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/09
  - Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/09
    - Re: Dealing with character ranges in grep, Bruno Haible, 2011/06/09
    - Re: Dealing with character ranges in grep, Paolo Bonzini <=
    - Re: Dealing with character ranges in grep, Bruno Haible, 2011/06/09
    - implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep], Paolo Bonzini, 2011/06/09
    - Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep], Bruno Haible, 2011/06/09
    - Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep], Paolo Bonzini, 2011/06/09
    - Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/10
    - Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/15
    - Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/16
    - Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/16
    - Re: Dealing with character ranges in grep, Philipp Thomas, 2011/06/16
    - Re: Dealing with character ranges in grep, Johannes Meixner, 2011/06/17

Prev by Date: Re: test-lock compilation failure on mingw
Next by Date: Re: Support of SOCK_CLOEXEC
Previous by thread: Re: Dealing with character ranges in grep
Next by thread: Re: Dealing with character ranges in grep
Index(es):
- Date
- Thread