Re: Processing large amounts of files

gwl-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Processing large amounts of files

From:	Ricardo Wurmus
Subject:	Re: Processing large amounts of files
Date:	Tue, 26 Mar 2024 22:30:45 +0100
User-agent:	mu4e 1.10.8; emacs 29.1

Ricardo Wurmus <rekado@elephly.net> writes:

> Liliana Marie Prikler <liliana.prikler@ist.tugraz.at> writes:
>
>> For comparison:
>>   time cat /tmp/meow/{0..7769}
>>   […]
>>   
>>   real       0m0,144s
>>   user       0m0,049s
>>   sys        0m0,094s
>>
>> It takes GWL 6 times longer to compute the workflow than to create the
>> inputs in Guile, and 600 times longer than to actually execute the
>> shell command.  I think there is room for improvement :)
>
> GWL checks if all input files exist before running the command.  Part of
> the difference you see here (takes about 2 seconds on my laptop) is GWL
> running FILE-EXISTS? on 7769 files.  This happens in prepare-inputs; its
> purpose:
>
>   "Ensure that all files in the INPUTS-MAP alist exist and are linked to
>   the expected locations.  Pick unspecified inputs from the environment.
>   Return either the INPUTS-MAP alist with any additionally used input
>   file names added, or raise a condition containing the list of missing
>   files."
>
> Another significant delay is introduced by the cache mechanism, which
> computes a unique prefix based on the contents of all input files.  It's
> not unexpected that this will take a little while, but it's not great
> either.

With commit f4442e409cf05d0c7cc4d6a251626d22efaffe8c it's a little
faster.  We used a whole lot of alists, and this becomes slow when there
are thousands of inputs.  We're now using hash tables.

-- 
Ricardo

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Processing large amounts of files, Ricardo Wurmus, 2024/03/21
- Re: Processing large amounts of files, Liliana Marie Prikler, 2024/03/21
- Re: Processing large amounts of files, Ricardo Wurmus <=
  - Re: Processing large amounts of files, Liliana Marie Prikler, 2024/03/27
    - Re: Processing large amounts of files, Ricardo Wurmus, 2024/03/27
- Re: Processing large amounts of files, Ricardo Wurmus, 2024/03/24
  - Re: Processing large amounts of files, Liliana Marie Prikler, 2024/03/25
    - Re: Processing large amounts of files, Ricardo Wurmus, 2024/03/25
    - Re: Processing large amounts of files, Ricardo Wurmus, 2024/03/25

Prev by Date: Re: Processing large amounts of files
Next by Date: Re: Processing large amounts of files
Previous by thread: Re: Processing large amounts of files
Next by thread: Re: Processing large amounts of files
Index(es):
- Date
- Thread