emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#21486: closed ([PATCH 3/3] dfa: cache transition f


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#21486: closed ([PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales)
Date: Tue, 16 Aug 2016 07:53:02 +0000

Your message dated Tue, 16 Aug 2016 00:51:52 -0700
with message-id <address@hidden>
and subject line Re: bug#21486: [PATCH 3/3] dfa: cache transition from a state 
with dot expression in non-UTF8 multibyte locales
has caused the debbugs.gnu.org bug report #21486,
regarding [PATCH 3/3] dfa: cache transition from a state with dot expression in 
non-UTF8 multibyte locales
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
21486: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=21486
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: [PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales Date: Tue, 15 Sep 2015 22:46:59 +0900
For many patterns, matching in non-UTF8 multibyte locales is as fast as
in single byte du to many improvement.  However, for patterns to begin
with one or more than period, it is still slow.  This change improves
performance for them.

Compare the run times of this command before and after this change:
(on a i5-4570 CPU @ 3.20GHz using rawhide (~fedora 22) and compiled
with gcc 5.1.1 20150618)
yes "$(printf 'a%38db\n' 0)" | head -1000000 >in

env LC_ALL=ja_JP.eucJP time -p src/grep '.a.b' in
  Before: 2.33
   After: 0.69

env LC_ALL=ja_JP.eucJP time -p src/grep ^..............................$ in
  Before: 2.35
  After : 0.45

$ env LC_ALL=ja_JP.eucJP time -p src/grep 
.......................................... in
  Before: 19.10
  After :  0.55

By the way, before applying this patch, two patches in bug#21266 needs
to be applied.

Attachment: 0003-dfa-cache-transition-from-a-state-with-dot-expressio.patch
Description: Text document


--- End Message ---
--- Begin Message --- Subject: Re: bug#21486: [PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales Date: Tue, 16 Aug 2016 00:51:52 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 Thanks for writing that patch. I installed it in grep master (after tweaking the commit message a bit) and am marking this bug report as done.

I noticed what appears to be a problem in the patch, in the code:

  d->mb_trans[s][mb_index & ~0] = state;

I expect the "0" was intended to be a "1". I attempted to fix this by installing the attached patch 1, using + rather than & and |, as this was easier for me to follow and is likely a tiny bit faster anyway.

I also installed the attached patch 2, which ports the resulting dfa.c to C90; as I understand it Gawk still needs this.

I also installed the attached patch 3, which does some minor refactoring and cleanup and commentary fixes. As you can see, I am a fan of Leibniz-style comparison (preferring < to > and <= to >=).

Thanks again for the patch.

Attachment: 0001-dfa-fix-context-newline-confusion.patch
Description: Text Data

Attachment: 0002-dfa-port-to-C90.patch
Description: Text Data

Attachment: 0003-dfa-minor-refactoring-and-doc-fixes.patch
Description: Text Data


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]