[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Processing large amounts of files
From: |
Ricardo Wurmus |
Subject: |
Re: Processing large amounts of files |
Date: |
Tue, 26 Mar 2024 22:30:45 +0100 |
User-agent: |
mu4e 1.10.8; emacs 29.1 |
Ricardo Wurmus <rekado@elephly.net> writes:
> Liliana Marie Prikler <liliana.prikler@ist.tugraz.at> writes:
>
>> For comparison:
>> time cat /tmp/meow/{0..7769}
>> […]
>>
>> real 0m0,144s
>> user 0m0,049s
>> sys 0m0,094s
>>
>> It takes GWL 6 times longer to compute the workflow than to create the
>> inputs in Guile, and 600 times longer than to actually execute the
>> shell command. I think there is room for improvement :)
>
> GWL checks if all input files exist before running the command. Part of
> the difference you see here (takes about 2 seconds on my laptop) is GWL
> running FILE-EXISTS? on 7769 files. This happens in prepare-inputs; its
> purpose:
>
> "Ensure that all files in the INPUTS-MAP alist exist and are linked to
> the expected locations. Pick unspecified inputs from the environment.
> Return either the INPUTS-MAP alist with any additionally used input
> file names added, or raise a condition containing the list of missing
> files."
>
> Another significant delay is introduced by the cache mechanism, which
> computes a unique prefix based on the contents of all input files. It's
> not unexpected that this will take a little while, but it's not great
> either.
With commit f4442e409cf05d0c7cc4d6a251626d22efaffe8c it's a little
faster. We used a whole lot of alists, and this becomes slow when there
are thousands of inputs. We're now using hash tables.
--
Ricardo