bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16966: [PATCH] grep: optimization with the superset of DFA


From: Norihiro Tanaka
Subject: bug#16966: [PATCH] grep: optimization with the superset of DFA
Date: Wed, 02 Apr 2014 07:21:02 +0900

Paolo Bonzini wrote:
> Yeah, but my problem is that a.b will look at a very long line if it
> is translated to a[\x0-\xff]*b.  Better translate it to a[\x0-\xff]{1,2}b
> or something similar.

I seem that it's no problem.

For example, I try following text for the pattern `a.b'.  Whereas the
digit isn't a part of the text but the line number.

1 accccccccccccccccccccccccccccccccccccccc
2 cccccccccccccccccccccccccccccccccccccccc
3 cccccccccccccccccccccccccccccccccccccccc
4 cccccccccccccccccccccccccccccccccccccccc
5 cccccccccccccccccccccccccccccccccccccccb

a --> a + CSET --> a + CSET --> ...... --> a + CSET + b (match)

Then all lines are matched fast.  However, because matches multiple
lines, retry from the last line (line 5).

It doesn't matches the pattern `a.b'.  Therefore the text is rejected.

Next, I try following text for it.

1 accccccccccccccccccccccccccccccccccccccc
2 cccccccccccccccccccccccccccccccccccccccc
3 cccccccccccccccccccccccccccccccccccccccc
4 cccccccccccccccccccccccccccccccccccccccc
5 accccccccccccccccccccccccccccccccccccccb

Then all lines are matched fast.  However, because matches multiple
lines, retry from the last line (line 5).  It's accepted by the
superset.  However, It's rejected by normal DFA.

On the other hands, It can be constituted just three DFA states.  It's
too simple.

             /\
            /  \
            \  /
             \/
  1:a ---> 2:CSET ---> 3:b

Thanks,
Norihiro






reply via email to

[Prev in Thread] Current Thread [Next in Thread]