[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
dfa is slower than regex in some cases
From: |
Norihiro Tanaka |
Subject: |
dfa is slower than regex in some cases |
Date: |
Sun, 03 Dec 2017 14:22:23 +0900 |
Hi,
A case that dfa is slower than regex on bug-gawk mailing list is
discussed 4 months ago.
http://lists.gnu.org/archive/html/bug-gawk/2017-07/msg00002.html
Although gawk is fixed to not use dfa in some cases, it looks like a
temporary treatment. (11d4ea518166ffbc0c2fe85d090723e8f299486c)
--------
$ env LC_ALL=C ./gawk --version
GNU Awk 4.2.0, API: 2.0 (GNU MPFR 3.1.5, GNU MP 4.3.2)
Copyright (C) 1989, 1991-2017 Free Software Foundation.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
$ time -p env LC_ALL=C ./gawk -f test.awk test.inp >/dev/null
real 1.38
user 1.17
sys 0.11
$ time -p env LC_ALL=C GAWK_NO_DFA=1 ./gawk -f test.awk test.inp >/dev/null
real 0.51
user 0.43
sys 0.07
--------
This problem seems to be due to dfa.c. In fact, grep is also slower
than not to use dfa in the same cases.
--------
$ env LC_ALL=C src/grep --version
grep (GNU grep) 3.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see
<http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
$ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
real 1.06
user 0.97
sys 0.08
--------
# grep does not have control by GREP_NO_DFA environment, so I changed
# as following and rebuilded, and run the same test as above.
========
diff --git a/lib/dfa.c b/lib/dfa.c
index 2b9c80e..f234b88 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -3474,21 +3474,7 @@ dfacomp (char const *s, size_t len, struct dfa *d, bool
searchflag)
dfaparse (s, len, d);
dfassbuild (d);
- if (dfa_supported (d))
- {
- dfaoptimize (d);
- dfaanalyze (d, searchflag);
- }
- else
- {
- d->dfaexec = dfaexec_noop;
- }
-
- if (d->superset)
- {
- d->fast = true;
- dfaanalyze (d->superset, searchflag);
- }
+ d->dfaexec = dfaexec_noop;
}
/* Free the storage held by the components of a dfa. */
========
--------
$ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
real 0.43
user 0.37
sys 0.05
--------
However, I do not still know why dfa is slower than regex in this case.
Thanks,
Norihiro
- dfa is slower than regex in some cases,
Norihiro Tanaka <=