bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and


From: arnold
Subject: Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and takes forever
Date: Thu, 13 Jul 2017 01:20:17 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Hi.

Can you try building from the gawk-4.1-stable branch in the git repo
and let me know if you still have the problem?

I'm also curious if you build from master in the repo what happens.

Thanks,

Arnold

Alexandre Oliva <address@hidden> wrote:

> Hi,
>
> I've upgraded the root in which I create and verify GNU Linux-libre
> tarballs from Fedora/Freed-ora 25 to 26, which brought gawk from 4.1.3 to
> 4.1.4.
>
> With 4.1.3, it used about 1GB of RAM and took some 15 minutes to run.
>
> With 4.1.4, I gave up after 2 hours of CPU time, and the process was at
> 6GB and growing.
>
> I saw a number of regexp changes in gawk 4.1.3-4.1.4 diff, so I took the
> Fedora 25 binary and it's running on the Fedora 26 root with the
> previous memory use.
>
> The command I use to perform this check is:
>
> deblob-check --use-awk linux-libre-4.12.tar.bz2
>
> deblob-check and the tarball can be downloaded from
> http://linux-libre.fsfla.org/pub/linux-libre/releases/4.12-gnu/
>
> The script generates and runs a gawk script with monster regexps that
> match known blobs, known false positives, and patterns that catch likely
> blobs, and it's running that generated script that's taking up a lot of
> RAM and time.
>
> deblob-check can use sed, python or perl instead of gawk, but gawk used
> to be the best choice for this final checking, because of the low memory
> use compared with sed, and the DFA-based regexp not available in python
> and perl.  (for deblobbing proper, python turns out to be better due to
> the much lower start-up time compiling the monster regexp)
>
> I haven't checked whether gawk 4.1.4 still beats the memory efficiency
> of sed, but sed was barely usable for this purpose back then, and gawk
> 4.1.4 is unfortunately turning out to be unusable too.
>
> Any recommendations as to how we could avoid this huge performance
> regression in gawk, short of switching to a different regexp processing
> engine?
>
> Thanks in advance,
>
> -- 
> Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer



reply via email to

[Prev in Thread] Current Thread [Next in Thread]