|
From: | GNU bug Tracking System |
Subject: | [debbugs-tracker] bug#21486: closed ([PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales) |
Date: | Tue, 16 Aug 2016 07:53:02 +0000 |
Your message dated Tue, 16 Aug 2016 00:51:52 -0700 with message-id <address@hidden> and subject line Re: bug#21486: [PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales has caused the debbugs.gnu.org bug report #21486, regarding [PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales to be marked as done. (If you believe you have received this mail in error, please contact address@hidden) -- 21486: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=21486 GNU Bug Tracking System Contact address@hidden with problems
--- Begin Message ---Subject: [PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales Date: Tue, 15 Sep 2015 22:46:59 +0900 For many patterns, matching in non-UTF8 multibyte locales is as fast as in single byte du to many improvement. However, for patterns to begin with one or more than period, it is still slow. This change improves performance for them. Compare the run times of this command before and after this change: (on a i5-4570 CPU @ 3.20GHz using rawhide (~fedora 22) and compiled with gcc 5.1.1 20150618) yes "$(printf 'a%38db\n' 0)" | head -1000000 >in env LC_ALL=ja_JP.eucJP time -p src/grep '.a.b' in Before: 2.33 After: 0.69 env LC_ALL=ja_JP.eucJP time -p src/grep ^..............................$ in Before: 2.35 After : 0.45 $ env LC_ALL=ja_JP.eucJP time -p src/grep .......................................... in Before: 19.10 After : 0.55 By the way, before applying this patch, two patches in bug#21266 needs to be applied.0003-dfa-cache-transition-from-a-state-with-dot-expressio.patch
Description: Text document
--- End Message ---
--- Begin Message ---Subject: Re: bug#21486: [PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales Date: Tue, 16 Aug 2016 00:51:52 -0700 Thanks for writing that patch. I installed it in grep master (after tweaking the commit message a bit) and am marking this bug report as done. User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 I noticed what appears to be a problem in the patch, in the code: d->mb_trans[s][mb_index & ~0] = state;I expect the "0" was intended to be a "1". I attempted to fix this by installing the attached patch 1, using + rather than & and |, as this was easier for me to follow and is likely a tiny bit faster anyway.I also installed the attached patch 2, which ports the resulting dfa.c to C90; as I understand it Gawk still needs this.I also installed the attached patch 3, which does some minor refactoring and cleanup and commentary fixes. As you can see, I am a fan of Leibniz-style comparison (preferring < to > and <= to >=).Thanks again for the patch.0001-dfa-fix-context-newline-confusion.patch
Description: Text Data0002-dfa-port-to-C90.patch
Description: Text Data0003-dfa-minor-refactoring-and-doc-fixes.patch
Description: Text Data
--- End Message ---
[Prev in Thread] | Current Thread | [Next in Thread] |