[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Processing large amounts of files
From: |
Ricardo Wurmus |
Subject: |
Re: Processing large amounts of files |
Date: |
Mon, 25 Mar 2024 10:25:22 +0100 |
User-agent: |
mu4e 1.10.8; emacs 29.1 |
Liliana Marie Prikler <liliana.prikler@ist.tugraz.at> writes:
>> When running with "-l all" I see this:
>>
>> info: .75 Computing workflow `cat'...
>> debug: 3.13 Computing script for process `meow'
>> guix: 3.13 Looking up package `bash-minimal'
>> guix: 3.13 Opening inferior Guix at
>> `/gnu/store/pb1nkrn3sg6a1j6c4r5j2ahygkf4vkv9-profile'
>> guix: 4.27 Looking up package `guix'
>> debug: 4.45 Generating all scripts and their dependencies.
>> debug: 4.89 Generating all scripts and their dependencies.
>> run: 6.73 Executing: /bin/sh -c
>> /gnu/store/5idhbvhrwj3p53kkz2vikdn1ypncwj84-gwl-meow.scm '((inputs
>> "/tmp/meow/0" ...
>> process: 8.80 In execvp of /bin/sh: Argument list too long
>> error: 8.80 Wrong type argument in position 1: #f
>>
>> This at least tells us that the last error here is due to sh refusing
>> to run.
> Good to know, and I thought it'd be just that, but… shouldn't this
> failure to invoke sh be caught through something?
Yes, it really should. This may be a problem with how we capture stdout
and stderr. I'll look into it.
>> > For comparison:
>> > time cat /tmp/meow/{0..7769}
>> > […]
>> >
>> > real 0m0,144s
>> > user 0m0,049s
>> > sys 0m0,094s
>> >
>> > It takes GWL 6 times longer to compute the workflow than to create
>> > the inputs in Guile, and 600 times longer than to actually execute
>> > the shell command. I think there is room for improvement :)
>>
>> Yeah, not good. Do you have any recommendations?
> We already talked about this in response to your second mail, but (LRU)
> Caching of things that can be cached would be an approach to take.
> Perhaps there's also inefficiencies in auto-connecting inputs – not
> exhibited by this example, but thinkable.
>
> Design-wise, we might need a way of splitting large worfklows anyhow.
> Files and environment variables work, but feel clunky at the moment,
> and particular files remind me about recursive make… maybe when I get
> the time, I can code something up and then look at ways for
> simplification.
I'd be very happy to see a rough proposal and/or patches. GWL is
currently unburdened due to the fact that it hardly has any active/vocal
users, so I'm willing to evolve it in a direction that serves actual
users.
--
Ricardo