Re: Processing large amounts of files

From: Ricardo Wurmus
Subject: Re: Processing large amounts of files
Date: Mon, 25 Mar 2024 10:25:22 +0100
Date: Mon, 25 Mar 2024 10:25:22 +0100

Liliana Marie Prikler <> writes:

>> When running with "-l all" I see this:
>>   info: .75 Computing workflow `cat'...
>>   debug: 3.13 Computing script for process `meow'
>>   guix: 3.13 Looking up package `bash-minimal'
>>   guix: 3.13 Opening inferior Guix at
>> `/gnu/store/pb1nkrn3sg6a1j6c4r5j2ahygkf4vkv9-profile'
>>   guix: 4.27 Looking up package `guix'
>>   debug: 4.45 Generating all scripts and their dependencies.
>>   debug: 4.89 Generating all scripts and their dependencies.
>>   run: 6.73 Executing: /bin/sh -c
>> /gnu/store/5idhbvhrwj3p53kkz2vikdn1ypncwj84-gwl-meow.scm '((inputs
>> "/tmp/meow/0" ...
>>   process: 8.80 In execvp of /bin/sh: Argument list too long
>>   error: 8.80 Wrong type argument in position 1: #f
>> This at least tells us that the last error here is due to sh refusing
>> to run.
> Good to know, and I thought it'd be just that, but… shouldn't this
> failure to invoke sh be caught through something?

Yes, it really should.  This may be a problem with how we capture stdout
and stderr.  I'll look into it.

>> > For comparison:
>> >   time cat /tmp/meow/{0..7769}
>> >   […]
>> >   
>> >   real  0m0,144s
>> >   user  0m0,049s
>> >   sys   0m0,094s
>> > 
>> > It takes GWL 6 times longer to compute the workflow than to create
>> > the inputs in Guile, and 600 times longer than to actually execute
>> > the shell command.  I think there is room for improvement :)
>> Yeah, not good.  Do you have any recommendations?
> We already talked about this in response to your second mail, but (LRU)
> Caching of things that can be cached would be an approach to take. 
> Perhaps there's also inefficiencies in auto-connecting inputs – not
> exhibited by this example, but thinkable.
> Design-wise, we might need a way of splitting large worfklows anyhow. 
> Files and environment variables work, but feel clunky at the moment,
> and particular files remind me about recursive make… maybe when I get
> the time, I can code something up and then look at ways for
> simplification.

I'd be very happy to see a rough proposal and/or patches.  GWL is
currently unburdened due to the fact that it hardly has any active/vocal
users, so I'm willing to evolve it in a direction that serves actual


