bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23932: dfa: use algorithm for single byte character to any single by


From: Norihiro Tanaka
Subject: bug#23932: dfa: use algorithm for single byte character to any single byte character in input text always
Date: Tue, 16 Aug 2016 23:35:22 +0900

On Sun, 10 Jul 2016 18:51:43 +0900
Norihiro Tanaka <address@hidden> wrote:

> In multibyte locales, if a pattern start with period expression,
> matching is still slow, as transition table is built at run time,
> even when next character is single byte in input text.
> 
> This patch changes it into as use algorithm for single byte character to
> any single byte character in input text always.  If transition table has
> been built already and a next character in input text is single byte,
> transit to next state by reference of only pre-built transition table,
> even if from a state including ANYCHAR.
> 
> $ yes "$(printf 'a%038db\n' 0)" | head -1000000 >in
> $ env LC_ALL=C gcc -v
> Reading specs from /usr/local/lib/gcc/x86_64-pc-linux-gnu/4.4.7/specs
> Target: x86_64-pc-linux-gnu
> Configured with: ./configure --with-as=/usr/local/bin/as 
> --with-ld=/usr/local/bin/ld --with-system-zlib --enable-__cxa_atexit
> Thread model: posix
> gcc version 4.4.7 (GCC)
> 
> patch#21486 is required before this patch.  grep will speed up by this
> patch additionaly.

I updated the patch due to change in bug#21486, and added a patch
including a minor change.

Attachment: 0001-dfa-use-algorithm-for-single-byte-character-to-any-s.patch
Description: Text document

Attachment: 0002-dfa-avoid-invalid-character-matches-period.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]