[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manu
From: |
Assaf Gordon |
Subject: |
bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually |
Date: |
Thu, 20 Oct 2016 22:08:34 -0400 |
Hello Norihiro,
Thanks for the update, I hope to have the tests soon.
For ease of discussion, attached all the accumulated patches as one file.
I'd like to ask about the DFA improvements in general (for my understanding):
First, every regex (whether in a sed address or in s/// command)
is compiled by both the previous regex engine (re_compile_match from gnulib's
regcomp.c)
and by the new DFA engine.
If the regex is simple anchor, the regex compilation is skipped (which is this
optimization thread).
Then, every executed regex is tested as follow:
1. If begline/endline - match without running regex at all
2. If the regex search starts at the beginning of the buffer (i.e. not "s///g"),
2.1. check with dfa/superset (if such superset exists).
If no match is found - return "no-match".
2.2. fall back to normal dfa.
If no match is found - return "no-match".
2.3. There one more case which returns a match:
if (!regsize && (regex->flags & REG_NEWLINE) && !backref)
return 1;
I'm not sure which cases it covers?
3. If DFA did not reject the match,
then the 'old' regex engine is executed with 're_search'
(or the custom code for 're_search' for multiline+NUL)
Is this description correct ?
In otherwise, the optimization of DFA is meant to
cover the more common case of no-matching,
at the expense of slightly more work when there is a possible match?
Lastly,
Could you expand on dfa's 'superset' and 'isfast' ?
what are the cases which there is/no superset,
and when is a DFA regex fast?
Many thanks!
- assaf
sed-regexp-optimizations-2016-10-20.patch.gz
Description: GNU Zip compressed data
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually, Norihiro Tanaka, 2016/10/04
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually, Assaf Gordon, 2016/10/05
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually, Norihiro Tanaka, 2016/10/06
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually, Assaf Gordon, 2016/10/17
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually, Norihiro Tanaka, 2016/10/18
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually,
Assaf Gordon <=
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually, Norihiro Tanaka, 2016/10/21
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually, Assaf Gordon, 2016/10/30
- bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually, Norihiro Tanaka, 2016/10/30