|
From: | Paolo Bonzini |
Subject: | bug#17025: [PATCH] grep: matching line-by-line with regex |
Date: | Tue, 01 Apr 2014 11:10:38 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 |
Il 17/03/2014 15:49, Norihiro Tanaka ha scritto:
Package: grep Tags: patch I ran following test, which used the regex enging in non-UTF8 locale. $ yes abcd.abc | head -10000 > m $ time -p env LC_ALL=ja_JP.eucJP src/grep abcd.abd m real 7.28 user 6.36 sys 0.57 It's extremally slow. When regex engine is used in grep, a text is splitted by line. However all of buffer is passed to re_search and re_match. I seem that it's wrong.
Yes, very good catch.It's likely that the old bytecode matcher didn't care, but the new one in glibc has to process even the "ignored" part of the buffer to find the boundaries of multibyte characters.
Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |