[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug #22100] Enhancement request:
From: |
Aaron Crane |
Subject: |
Re: [bug #22100] Enhancement request: |
Date: |
Tue, 12 Feb 2008 11:53:24 +0000 |
User-agent: |
Mutt/1.5.15+20070412 (2007-04-11) |
Ed Avis writes:
> Tony Abou-Assaleh wrote:
> >The and operator can be done as follows:
> >
> >grep -E '(re1.*re2)|(re2.*re1)'
>
> That doesn't work for all cases, for example (with -E)
>
> re1=^hello
> re2=^\w+
I'm assuming you meant
re1=hello
re2=^\w+
because otherwise, requiring both of those to match is the same as
requiring just ^hello to match.
But it's still possible to write a regex that matches exactly when
both of the revised regexes do. I believe it would look something
like this:
^(hello|\w.*hello)
The closure of regular expressions under conjunction doesn't guarantee
that you get a short or non-repetitive regex out at the end, only that
you can get some regex.
> I wonder if perl5 so-called regular expressions are 'closed under
> and' in this way. Or if a perl5 regexp can be used to give the
> conjunction of two plain grep regexps, but not necessarily of two
> perl5 regexps.
I'm no mathematician, but I think that traditional BREs are closed
under disjunction if you can do the equivalent of alpha-renaming on
backreferences. Closure under conjunction in the presence of backrefs
sounds trickier. Handling full PCRE regexes sounds trickier still;
real Perl regexes are almost certainly impossible, given the existence
of explicitly procedural constructs including execution of arbitrary
Perl code.
> As for --not, perl5 regexps do support negation, I think: the
> pattern ^(?!x)$ matches all lines except those matching ^x$.
Not quite; it matches the empty string. You can always understand
lookaround as being a zero-width assertion that provides an additional
constraint on what may match. So to understand ^(?!x)$ , first take
out the lookahead, leaving just ^$ which clearly matches only the
empty string. Then the (?!x) just says "and also fail if there's an x
immediately after the start of the empty string we're matching", which
has no interesting effect.
Given lookaround, "all lines except those matching ^x$" is just
^(?!x$) which says "match the beginning of the string, except where
that is immediately followed by an x and the end of the string".
I think it's always mathematically possible to express "anything
except regex R" using lookaround, but it can certainly be hard to
write such regexes.
> >Making grep do more with less is on my radar, but it is not a
> >priority at the moment. There are some serious bugs that need to be
> >fixed first.
>
> Understood.
For the record, I'm strongly in favour of an option --all which would
require all -e patterns to match, and being able to negate selected
patterns would also be helpful.
But, yes, I also understand the need to fix the existing bugs before
finding exciting places for new ones to hide. :-)
--
Aaron Crane