[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Performance issues with using local-file/interned-file/add-to-store
From: |
Christopher Baines |
Subject: |
Re: Performance issues with using local-file/interned-file/add-to-store |
Date: |
Sun, 17 Sep 2017 21:52:07 +0100 |
On Sun, 17 Sep 2017 22:22:32 +0200
address@hidden (Ludovic Courtès) wrote:
> Hello!
>
> Christopher Baines <address@hidden> skribis:
>
> > So I've been playing around with managing some data files with
> > Guix. On the whole, this is working quite nicely so far, but I'm
> > having some performance issues with using the local-file gexp.
> >
> > Using it for large files (~1 to ~4 GB in my case) causes the
> > guix-daemon to use a large amount of CPU and memory. This isn't a
> > problem, but as the data files I'm working with don't change, once
> > the file has been added to the store, it doesn't need to be added
> > again.
>
> The memory issue in guix-daemon is a Known Problem, see
> <https://bugs.gnu.org/23666>.
>
> I guess it wasn’t a pressing issue until now because almost all our
> use cases were dealing with small files.
Ah, ok, that makes sense then. Maybe this can be improved on with the
guile guix-daemon...
> > The slow performance means that even if nothing needs adding to
> > the store or building, it can take a while to work that out, as all
> > the big files that you are using have to be added to the store
> > again anyway.
> >
> > It would be good for me to have the option to cache the result of
> > running add-to-store through the local-file gexp, perhaps using the
> > full filename, or the hash as the cache key?
>
> Internally, (guix store) has a cache (see ‘add-to-store-cache’) to
> make sure that, during a session, the ‘add-to-store’ RPC for a given
> file is done once only.
I did see this, but didn't really think about it when thinking about
caching. But yes, this does do caching, but with the way I'm working at
the moment, it doesn't quite persist long enough.
> If you have large files, that single RPC can already be a lot, though.
>
> In the “RPC pipelining” thread¹, I proposed a patch that allows us to
> avoid actually making the ‘add-to-store’ RPC. The patch changes
> ‘add-to-store’ to compute the resulting store file name locally,
> without actually making the RPC, and makes that RPC at a later time.
>
> This approach is not helpful performance-wise for small files and when
> talking to a local daemon. However, it could serve as a trick for
> large files, where we could do something like:
>
> 1. Compute store file name for large file on the client side;
>
> 2. Call ‘add-temp-root’ for that store item; if that works, that
> means it’s already in store, otherwise we need to do ‘add-to-store’.
>
> The downside is that #1 requires traversing the whole file, so maybe
> it doesn’t help.
I'll go and have a read of that thread again, but my impression of this
is that it might open up the opportuntity to forgo the add-to-store if
you can confirm that the relevant store already has the thing in it.
Regarding the small/large file tradeoff, a file size threshold could be
hardcoded and used to decide which method to use.
> Hmm…
>
> Of course we could also have a cache in ~/.cache/guix for all this,
> as a last resort.
That was something I was thinking of, caching things with the key being
the filename or hash, but not doing this by default, as it probably
wouldn't improve the general use case.
> Thoughts?
There does seem to be an intersection between the pipelining
requirements for computing the store file name, and the potential to
use that name in working out if the add-to-store even needs to happen.
Maybe computing the store file name for large files, and using that to
avoid expensive add-to-store calls is a good starting point?
pgpIzcrGEnAC1.pgp
Description: OpenPGP digital signature