bug#18777: [PATCH] dfa: improvement for checking of multibyte character

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18777: [PATCH] dfa: improvement for checking of multibyte character

From:	Norihiro Tanaka
Subject:	bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Date:	Thu, 23 Oct 2014 00:28:35 +0900

address@hidden wrote:
> Gawk does not remove CR in advance, unless someone specifically
> set RS = "\r\n", in which case the full regex matcher is used
> to first find \r\n in the raw input buffer.

Thanks, I also confirmed it on source code of Gawk.

> So for gawk, adding a check for (c == eolbyte || c == '\r')
> should produce more speedup on Windows.
> 
> (Hmm, on Windows the default is probably text mode which causes
> the library/OS to hide the \r anway. Harumph.  But if binary mode
> wsa requested then it could still make a difference.)

I think It's better to build KWset rather than rely on checking for '\r'
in non-UTF8 multibyte mode of DFA.

Further more, even if we add checking for '\r' to DFA, I think that we
can't use to speed up on Windows, so that DFA can't correctly locate a matched
position except a pattern which is fixed string.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/10/20
- bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/10/20
- bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Eric Blake, 2014/10/20
  - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/10/20
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, arnold, 2014/10/21
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/10/21
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, arnold, 2014/10/21
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka <=

Prev by Date: bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Next by Date: bug#18800: [PATCH 1/2] dfa: prefer bool at DFA interfaces
Previous by thread: bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Next by thread: bug#18800: [PATCH 1/2] dfa: prefer bool at DFA interfaces
Index(es):
- Date
- Thread