[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug#22357: grep -f not only huge memory usage, but also huge time co
From: |
Norihiro Tanaka |
Subject: |
Re: bug#22357: grep -f not only huge memory usage, but also huge time cost |
Date: |
Mon, 12 Dec 2016 23:43:47 +0900 |
On Sun, 11 Dec 2016 18:55:32 +0100
Bruno Haible <address@hidden> wrote:
> Norihiro Tanaka wrote:
> > dfa matcher is not always slower than kws matcher. ...
> > It's a trade-off. Can you have any idea to select the better
> > matcher for both two cases?
>
> There are at least two approaches.
>
> 1) If you have vastly different run times (2 sec vs. 10 min or so), the
> program can set up two threads and run each algorithm in one thread. Buffer
> the output. When a thread terminates, use its output and kill the other
> thread.
>
> Now that is still suboptimal, because on average this approach will lose a
> factor of 2, just because it does not know which is the best algorithm.
>
> 2) You can run a set of benchmarks, indexed by 2 parameters (m,n)
> m = number of keywords,
> n = total number of bytes of the files to be searched,
> run each of the two algorithms, and note the best algorithm. This way
> you fill a matrix of results (in the (m,n) plane). Now find a formula
> that roughly tells you for given (m,n) whether you are in one area of
> the plane (where kwset is better) or in the other area of the plane
> (where dfa is better). Finally, code this formula into the 'grep' program.
We can not know N before searching. If input is a regular file, we can
examine it with stat(), but it may be STDIN.
BTW, following test case uses a lot of memory with grep matcher
specially, as grep compiles both dfa and regex. dfa uses a lot of memory
in calculation of `D->follow' in dfaanalyze(). In my machine, dfa uses
about 500MB, and regex uses about 1GB.
$ env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, (continued)
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Bruno Haible, 2016/12/11
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, arnold, 2016/12/11
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/11
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Bruno Haible, 2016/12/12
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/14
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/17
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/19
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/19
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/19
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/20
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost,
Norihiro Tanaka <=
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Trevor Cordes, 2016/12/12
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/12
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Trevor Cordes, 2016/12/12
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/12
- Re: bug#22357: grep -f not only huge memory usage, but also huge time cost, L.A. Walsh, 2016/12/15