[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪
From: |
Trent W. Buck |
Subject: |
bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union) |
Date: |
Fri, 18 Oct 2019 22:49:23 +1100 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
Paul Eggert wrote:
> On 10/16/19 5:19 PM, Trent W. Buck wrote:
> > I would expect "grep -Fw -e 4GB -e DDR4 --and" to print the same thing as
> >
> > grep -Fw 4GB | grep -Fw DDR4 | grep -Fw -e 4GB -e DDR4 -o
>
> You're right, it's not obvious. :-)
>
> It may be better to just pipe greps together, as you do now. That's simple
> and fast enough for this relatively-uncommon case, and it's portable to all
> greps.
I admit that most of the time, I want "grep --and" for a small dataset
(<1MB computer_parts.txt), so it's merely a convenience.
Sometimes I grep audit logs (~1TB uncompressed), which takes anywhere
from 15 minutes to 3 days, depending on how I tweak my grep calls.
In that case, each grep in the pipeline has to pay the costs to
de-serialize input from the previous grep, and re-serialize output to
the next grep. If the first grep matches (say) 200GB of the 1TB,
that's can be a lot of overhead (I assume).
I was basically hoping that if it was all in a single grep process,
the de/serialization steps could be skipped completely.
I think the buzzword for that is "zero-copy"?
I've noticed "grep" is about 30% slower than either "grep -F" or
"LC_COLLATE=C grep", because (I think) it avoids the costs of decoding
from UTF-8 to Unicode and back. So I was basically expecting a
similar saving from --and.
I'm only speaking as an end user - I haven't dug through the grep
source, so those expectations might be unrealistic, and implementing
it might be painful/impossible. I figured I should at least ask :-)
If your expert opinion is that it's a pain to implement (and
maintain!) and there's not enough demand, then I'm OK with that.
This is NOT something that's burning me every day.
Regardless, I appreciate you taking the time to discuss it. :-)
PS: Regarding portability, I'm personally not worried because when I
need a GNUism badly enough (e.g. du --threshold), I can usually get
permission to install the relevant GNU software, even if it's only
into %APPDATA% or $HOME.
PS: I noticed on bugs.gnu.org something about grep being
single-threaded, which might mean "grep --and" would end up being
SLOWER than the existing pipelines, since the kernel can distribute
a pipeline's elements across multiple CPUs/cores.
- bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union), Trent W. Buck, 2019/10/14
- bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union), Norihiro Tanaka, 2019/10/16
- bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union), Paul Eggert, 2019/10/16
- bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union), Trent W. Buck, 2019/10/16
- bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union), Paul Eggert, 2019/10/17
- bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union),
Trent W. Buck <=
- bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union), Paul Eggert, 2019/10/18
- bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union), Paul Jackson, 2019/10/18
bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union), Norihiro Tanaka, 2019/10/19