guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Leveraging the synergy of deduplication


From: Ludovic Courtès
Subject: Leveraging the synergy of deduplication
Date: Wed, 25 Mar 2015 14:46:46 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)

Currently the daemon implements a simple yet efficient way do
deduplicate files identical among store items.  The /gnu/store/.links
directory contains hard links to files in the store; the link name is
the base32-encoded SHA256 of the file.  When the daemon adds a new file
in the store, it checks in /gnu/store/.links whether an identical file
is already in store, and if so makes a hard link to that thing.

When installing, say, two different variants of texlive, which in
practice are 90% bit-identical, there’s a lot of deduplication
happening.  However, we still end up downloading the whole texlive
archive just to realize that we already have most of its files in store.

A solution to this would be to change the HTTP substitute protocol.
‘guix publish’ could serve content-addressed files.  For instance,

  http://example.org/1ghws12lrp62vvxxxqmxp7jgxv2p18ihiyq420ag77nh9bw5qsfg.file

would serve the contents of the store file that has the given hash.

The archive format would have to be different from the one currently
implemented by ‘write-file’: for regular files, ‘write-contents’ would
simply write the hash of the contents, and it would be up to the
substituter to go fetch that file if it’s not already in store (which
can be determined by looking it up in /gnu/store/.links.)

This is not very sophisticated, but has the advantage of being
relatively easy to implement in Guix itself.

The downside is that Hydra would most likely not implement this new
protocol (which would give us another incentive to move away from it.)

Thoughts?  Patches?  :-)

Ludo’.

PS: Title inspired by <http://www.sansbullshitsans.com/>.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]