bug#17025: [PATCH] grep: matching line-by-line with regex

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17025: [PATCH] grep: matching line-by-line with regex

From:	Paolo Bonzini
Subject:	bug#17025: [PATCH] grep: matching line-by-line with regex
Date:	Tue, 01 Apr 2014 11:10:38 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

Il 17/03/2014 15:49, Norihiro Tanaka ha scritto:

Package: grep
Tags: patch

I ran following test, which used the regex enging in non-UTF8 locale.

$ yes abcd.abc | head -10000 > m
$ time -p env LC_ALL=ja_JP.eucJP src/grep abcd.abd m
real 7.28
user 6.36
sys 0.57

It's extremally slow.  When regex engine is used in grep, a text is
splitted by line.  However all of buffer is passed to re_search and
re_match.  I seem that it's wrong.


Yes, very good catch.

It's likely that the old bytecode matcher didn't care, but the new onein glibc has to process even the "ignored" part of the buffer to findthe boundaries of multibyte characters.


Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

bug#17025: [PATCH] grep: matching line-by-line with regex, Paolo Bonzini <=

Prev by Date: bug#17013: [PATCH] grep: optimization by using the Galil rule for Boyer-Moore algorithm in KWSet
Next by Date: bug#17156: [PATCH 0/5] Patches to apply
Previous by thread: bug#17013: [PATCH] grep: optimization by using the Galil rule for Boyer-Moore algorithm in KWSet
Next by thread: bug#17156: [PATCH 0/5] Patches to apply
Index(es):
- Date
- Thread