bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and


From: arnold
Subject: Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and takes forever
Date: Thu, 13 Jul 2017 01:42:21 -0600
User-agent: Heirloom mailx 12.4 7/29/08

If neither of those are any better, then let's work offline to isolate
when things broke. "git bisect" is quite good at that.  :-) If possible,
I'd prefer to fix the problem instead of leaving things alone.

Thanks,

Arnold

address@hidden wrote:

> Hi.
>
> Can you try building from the gawk-4.1-stable branch in the git repo
> and let me know if you still have the problem?
>
> I'm also curious if you build from master in the repo what happens.
>
> Thanks,
>
> Arnold
>
> Alexandre Oliva <address@hidden> wrote:
>
> > Hi,
> >
> > I've upgraded the root in which I create and verify GNU Linux-libre
> > tarballs from Fedora/Freed-ora 25 to 26, which brought gawk from 4.1.3 to
> > 4.1.4.
> >
> > With 4.1.3, it used about 1GB of RAM and took some 15 minutes to run.
> >
> > With 4.1.4, I gave up after 2 hours of CPU time, and the process was at
> > 6GB and growing.
> >
> > I saw a number of regexp changes in gawk 4.1.3-4.1.4 diff, so I took the
> > Fedora 25 binary and it's running on the Fedora 26 root with the
> > previous memory use.
> >
> > The command I use to perform this check is:
> >
> > deblob-check --use-awk linux-libre-4.12.tar.bz2
> >
> > deblob-check and the tarball can be downloaded from
> > http://linux-libre.fsfla.org/pub/linux-libre/releases/4.12-gnu/
> >
> > The script generates and runs a gawk script with monster regexps that
> > match known blobs, known false positives, and patterns that catch likely
> > blobs, and it's running that generated script that's taking up a lot of
> > RAM and time.
> >
> > deblob-check can use sed, python or perl instead of gawk, but gawk used
> > to be the best choice for this final checking, because of the low memory
> > use compared with sed, and the DFA-based regexp not available in python
> > and perl.  (for deblobbing proper, python turns out to be better due to
> > the much lower start-up time compiling the monster regexp)
> >
> > I haven't checked whether gawk 4.1.4 still beats the memory efficiency
> > of sed, but sed was barely usable for this purpose back then, and gawk
> > 4.1.4 is unfortunately turning out to be unusable too.
> >
> > Any recommendations as to how we could avoid this huge performance
> > regression in gawk, short of switching to a different regexp processing
> > engine?
> >
> > Thanks in advance,
> >
> > -- 
> > Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> > You must be the change you wish to see in the world. -- Gandhi
> > Be Free! -- http://FSFLA.org/   FSF Latin America board member
> > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer



reply via email to

[Prev in Thread] Current Thread [Next in Thread]