bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #22100] Enhancement request:


From: Aaron Crane
Subject: Re: [bug #22100] Enhancement request:
Date: Tue, 12 Feb 2008 11:53:24 +0000
User-agent: Mutt/1.5.15+20070412 (2007-04-11)

Ed Avis writes:
> Tony Abou-Assaleh wrote:
> >The and operator can be done as follows:
> >
> >grep -E '(re1.*re2)|(re2.*re1)'
> 
> That doesn't work for all cases, for example (with -E)
> 
>     re1=^hello
>     re2=^\w+

I'm assuming you meant

  re1=hello
  re2=^\w+

because otherwise, requiring both of those to match is the same as
requiring just ^hello to match.

But it's still possible to write a regex that matches exactly when
both of the revised regexes do.  I believe it would look something
like this:

  ^(hello|\w.*hello)

The closure of regular expressions under conjunction doesn't guarantee
that you get a short or non-repetitive regex out at the end, only that
you can get some regex.

> I wonder if perl5 so-called regular expressions are 'closed under
> and' in this way.  Or if a perl5 regexp can be used to give the
> conjunction of two plain grep regexps, but not necessarily of two
> perl5 regexps.

I'm no mathematician, but I think that traditional BREs are closed
under disjunction if you can do the equivalent of alpha-renaming on
backreferences.  Closure under conjunction in the presence of backrefs
sounds trickier.  Handling full PCRE regexes sounds trickier still;
real Perl regexes are almost certainly impossible, given the existence
of explicitly procedural constructs including execution of arbitrary
Perl code.

> As for --not, perl5 regexps do support negation, I think: the
> pattern ^(?!x)$ matches all lines except those matching ^x$.

Not quite; it matches the empty string.  You can always understand
lookaround as being a zero-width assertion that provides an additional
constraint on what may match.  So to understand ^(?!x)$ , first take
out the lookahead, leaving just ^$ which clearly matches only the
empty string.  Then the (?!x) just says "and also fail if there's an x
immediately after the start of the empty string we're matching", which
has no interesting effect.

Given lookaround, "all lines except those matching ^x$" is just
^(?!x$) which says "match the beginning of the string, except where
that is immediately followed by an x and the end of the string".

I think it's always mathematically possible to express "anything
except regex R" using lookaround, but it can certainly be hard to
write such regexes.

> >Making grep do more with less is on my radar, but it is not a
> >priority at the moment. There are some serious bugs that need to be
> >fixed first.
> 
> Understood.

For the record, I'm strongly in favour of an option --all which would
require all -e patterns to match, and being able to negate selected
patterns would also be helpful.

But, yes, I also understand the need to fix the existing bugs before
finding exciting places for new ones to hide. :-)

-- 
Aaron Crane




reply via email to

[Prev in Thread] Current Thread [Next in Thread]