bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

implementing extended bracket expressions in gnulib [was Re: Dealing wit


From: Paolo Bonzini
Subject: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep]
Date: Thu, 09 Jun 2011 13:32:02 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.10

On 06/09/2011 01:12 PM, Bruno Haible wrote:
What would it take to let distros/people use --with-included-regex and
get understandable semantics for ranges + working equivalence classes?

I would prefer that to your proposal, because it cannot be seen as a
regression by people who care about equivalence classes.

My proposal wouldn't change defaults, which is why I believe that this is a separate topic. You quoted

With my proposal, distros/people that use --with-included-regex would
get understandable semantics + no equivalence classes

but snipped this part:

Right now, some distros/people use --with-included-regex and get
broken semantics + no equivalence classes

So, I agree that understandable semantics for ranges + working equivalence classes would be the best, and if gnulib could provide that, I would champion making --with-included-regex the default. However, 1) Aharon would like to release gawk 4.0 in the very near future, and 2) adding an extension to glibc takes time. That's why I prefer to work in smaller steps.

Can that be done through gnulib code? If not, what do we need from glibc
to get it done in gnulib?

We'd need glibc to export two functions in both multi-byte and wide-character versions:

1) streqcoll(S1, S2) and wcseqcoll(S1, S2) would be the same as strcoll and wcscoll, but they would compare only according to primary weights. A slightly more formal definition is that streqcoll(S1, S2) == 0 iff S1 matches the \`[=C1=][=C2=][=C3=]...[=Cn=]\' regular expression, where Ci are the characters of S2 (I'd need to double check this against POSIX though). When non-zero, the result of streqcoll(S1, S2) would be the same as strcoll(S1, S2). Likewise, glibc could provide streqxfrm and wcseqxfrm, with the definition that strcmp(streqxfrm(S1), streqxfrm(S2)) == streqcoll(S1, S2).

2) On top of this, [.ss.] could be implemented using an additional function mbelemlen(S) giving the length of the first collation element in S. [.S1.] would be rejected unless mbelemlen(S1) == strlen(S1), and [.S1.] would match S2 if strcoll(S1, strndup(S2, mbelemlen(S2))) == 0. wcelemlen could be provided likewise.

These are the minimal extensions that would be required to support full regular expression features portably and in a manner that is compatible with glibc, except for ranges (which we don't care about, do we?).

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]