emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#22357: closed (grep -f huge memory usage)


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#22357: closed (grep -f huge memory usage)
Date: Wed, 21 Dec 2016 06:46:03 +0000

Your message dated Tue, 20 Dec 2016 21:17:01 -0800
with message-id <address@hidden>
and subject line Re: bug#22357: grep -f not only huge memory usage, but also 
huge time cost
has caused the debbugs.gnu.org bug report #22357,
regarding grep -f huge memory usage
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
22357: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22357
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: grep -f huge memory usage Date: Tue, 12 Jan 2016 08:39:35 -0500 (EST)
Using the files [1], [2] (around 1.4 MB each) and running the
following command:

$ grep -v -f file1 file2

quickly consumes all of my memory (16 GB) and exits with memory
exhausted error. I have reports that it behave the same on machine
with 180 GB RAM. Well, -F should be used as there are no regexes,
but the same worked with grep-2.5.1 and consumed only cca 0.5 GB of RAM.
Maybe there is a room for optimization

thanks & regards

Jaroslav

[1] https://jskarvad.fedorapeople.org/grep/file1
[2] https://jskarvad.fedorapeople.org/grep/file2



--- End Message ---
--- Begin Message --- Subject: Re: bug#22357: grep -f not only huge memory usage, but also huge time cost Date: Tue, 20 Dec 2016 21:17:01 -0800 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 I installed the attached patches into grep master. These fix the performance regressions noted at the start of Bug#22357. I see that the related performance problems noted in Bug#21763 seem to be fixed too, I expect because of Norihiro Tanaka's recent changes, so I'll boldly close both bug reports.

To some extent the attached patches restore the old behavior for grep -F, when grep is given two or more patterns. The patch doesn't change the underlying algorithms; it merely uses a different heuristic to decide whether to use the -F matcher. Although I wouldn't be surprised if the attached patches hurt performance in some cases, I didn't uncover any such cases in my performance testing, which I admit mostly consisted of running the examples in the abovementioned bug reports.

I'll leave Bug#22239 open, as I get the following performance figures (user+system CPU time) for the Bug#22239 benchmark, where list.txt is created by "aspell dump master | head -n 100000 >list.txt", and the grep commands all use the operands "-F -f list.txt /etc/passwd" in the en_US.utf8 locale on Fedora 24 x86-64.

  no -i       -i       grep version
   0.25      0.33      2.16
   0.26     10.95      2.21
   0.11      2.90*     current master (including attached patches)

In the C locale, the current grep master is always significantly faster than grep 2.16 or 2.21 on the benchmark, so the only significant problem is the number marked "*". I ran the benchmarks on an AMD Phenom II X4 910e.

Attachment: 0001-grep-simplify-line-counting-in-patterns.patch
Description: Text Data

Attachment: 0002-grep-simplify-matcher-configuration.patch
Description: Text Data

Attachment: 0003-grep-fix-performance-with-multiple-patterns.patch
Description: Text Data


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]