bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

dfa is slower than regex in some cases


From: Norihiro Tanaka
Subject: dfa is slower than regex in some cases
Date: Sun, 03 Dec 2017 14:22:23 +0900

Hi,

A case that dfa is slower than regex on bug-gawk mailing list is
discussed 4 months ago.

http://lists.gnu.org/archive/html/bug-gawk/2017-07/msg00002.html

Although gawk is fixed to not use dfa in some cases, it looks like a
temporary treatment. (11d4ea518166ffbc0c2fe85d090723e8f299486c)

--------
$ env LC_ALL=C ./gawk --version
GNU Awk 4.2.0, API: 2.0 (GNU MPFR 3.1.5, GNU MP 4.3.2)
Copyright (C) 1989, 1991-2017 Free Software Foundation.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
$ time -p env LC_ALL=C ./gawk -f test.awk test.inp >/dev/null
real 1.38
user 1.17
sys 0.11
$ time -p env LC_ALL=C GAWK_NO_DFA=1 ./gawk -f test.awk test.inp >/dev/null
real 0.51
user 0.43
sys 0.07
--------

This problem seems to be due to dfa.c.  In fact, grep is also slower
than not to use dfa in the same cases.

--------
$ env LC_ALL=C src/grep --version
grep (GNU grep) 3.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see 
<http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
$ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
real 1.06
user 0.97
sys 0.08
--------

  # grep does not have control by GREP_NO_DFA environment, so I changed
  # as following and rebuilded, and run the same test as above.

========
diff --git a/lib/dfa.c b/lib/dfa.c
index 2b9c80e..f234b88 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -3474,21 +3474,7 @@ dfacomp (char const *s, size_t len, struct dfa *d, bool 
searchflag)
   dfaparse (s, len, d);
   dfassbuild (d);

-  if (dfa_supported (d))
-    {
-      dfaoptimize (d);
-      dfaanalyze (d, searchflag);
-    }
-  else
-    {
-      d->dfaexec = dfaexec_noop;
-    }
-
-  if (d->superset)
-    {
-      d->fast = true;
-      dfaanalyze (d->superset, searchflag);
-    }
+  d->dfaexec = dfaexec_noop;
 }

 /* Free the storage held by the components of a dfa.  */
========

--------
$ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
real 0.43
user 0.37
sys 0.05
--------

However, I do not still know why dfa is slower than regex in this case.

Thanks,
Norihiro




reply via email to

[Prev in Thread] Current Thread [Next in Thread]