guix-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

03/03: website: Add post about lzipped substitutes.


From: Ludovic Courtčs
Subject: 03/03: website: Add post about lzipped substitutes.
Date: Mon, 17 Jun 2019 08:51:55 -0400 (EDT)

civodul pushed a commit to branch master
in repository guix-artwork.

commit fafd36a62ef00aadc7ba1b3988215286f689d893
Author: Ludovic Courtès <address@hidden>
Date:   Mon Jun 17 14:24:21 2019 +0200

    website: Add post about lzipped substitutes.
    
    * website/posts/lzip.md: New file.
---
 website/posts/lzip.md | 245 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 245 insertions(+)

diff --git a/website/posts/lzip.md b/website/posts/lzip.md
new file mode 100644
index 0000000..ebd4c44
--- /dev/null
+++ b/website/posts/lzip.md
@@ -0,0 +1,245 @@
+title: Substitutes are now available as lzip
+date: 2019-06-17 14:30
+author: Ludovic Courtès
+tags: Scheme API
+---
+
+For a long time, our build farm at ci.guix.gnu.org has been delivering
+[substitutes](https://www.gnu.org/software/guix/manual/en/html_node/Substitutes.html)
+(pre-built binaries) compressed with gzip.  Gzip was never the best
+choice in terms of compression ratio, but it was a reasonable and
+convenient choice: it’s rock-solid, and zlib made it easy for us to have
+[Guile
+bindings](https://git.savannah.gnu.org/cgit/guix.git/tree/guix/zlib.scm)
+to perform in-process compression in our multi-threaded [`guix
+publish`](https://www.gnu.org/software/guix/manual/en/html_node/Invoking-guix-publish.html)
+server.
+
+With the exception of building software from source, downloads take the
+most time of Guix package upgrades.  If users can download less,
+upgrades become faster, and happiness ensues.  Time has come to improve
+on this, and starting from early June, Guix can publish and fetch
+[lzip](https://nongnu.org/lzip/)-compressed substitutes, in addition to
+gzip.
+
+# Lzip
+
+[Lzip](https://nongnu.org/lzip/) is a relatively little-known
+compression format, initially developed by Antonio Diaz Diaz ca. 2013.
+It has several C and C++ implementations with surprisingly few lines of
+code, which is always reassuring.  One of its distinguishing features is
+a very good compression ratio with reasonable CPU and memory
+requirements, [according to benchmarks published by the
+authors](https://nongnu.org/lzip/lzip_benchmark.html).
+
+[Lzlib](https://nongnu.org/lzip/lzlib.html) provides a well-documented C
+interface and Pierre Neidhardt set out to write bindings for that
+library, which eventually landed as the [`(guix lzlib)`
+module](https://git.savannah.gnu.org/cgit/guix.git/tree/guix/lzlib.scm).
+
+With this in place we were ready to start migrating our tools, and then
+our build farm, to lzip compression, so we can all enjoy smaller
+downloads.  Well, easier said than done!
+
+# Migrating
+
+The compression format used for substitutes is not a core component like
+it can be in “traditional” binary package formats [such as
+`.deb`](https://lwn.net/Articles/789449/) since Guix is conceptually a
+“source-based” distro.  However, deployed Guix installations did not
+support lzip, so we couldn’t just switch our build farm to lzip
+overnight; we needed to devise a transition strategy.
+
+Guix asks for the availability of substitutes over HTTP.  For example, a
+question such as:
+
+> “Dear server, do you happen to have a binary of
+> `/gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2` that I could 
download?”
+
+translates into prose to an HTTP GET of
+[https://ci.guix.gnu.org/6yc4ngrsig781bpayax2cg6pncyhkjpq.narinfo](https://ci.guix.gnu.org/6yc4ngrsig781bpayax2cg6pncyhkjpq.narinfo),
+which returns something like:
+
+```
+StorePath: /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
+URL: nar/gzip/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
+Compression: gzip
+NarHash: sha256:0h2ibqpqyi3z0h16pf7ii6l4v7i2wmvbrxj4ilig0v9m469f6pm9
+NarSize: 134407424
+References: 2dk55i5wdhcbh2z8hhn3r55x4873iyp1-libxext-1.3.3 …
+FileSize: 48501141
+System: x86_64-linux
+Deriver: 6xqibvc4v8cfppa28pgxh0acw9j8xzhz-emacs-26.2.drv
+Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…
+```
+
+(This narinfo format is inherited from [Nix](https://nixos.org/nix/) and
+implemented
+[here](https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/substitute.scm?id=121d9d1a7a2406a9b1cbe22c34343775f5955b34#n283)
+and
+[here](https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm?id=121d9d1a7a2406a9b1cbe22c34343775f5955b34#n265).)
+This tells us we can download the actual binary from
+`/nar/gzip/…-emacs-26.2`, and that it will be about 46 MiB (the
+`FileSize` field.)  This is what `guix publish` serves.
+
+The trick we came up with was to allow `guix publish` to advertise
+several URLs, one per compression format.  Thus, for recently-built
+substitutes, we get something [like
+this](https://ci.guix.gnu.org/mvhaar2iflscidl0a66x5009r44fss15.narinfo):
+
+```
+StorePath: /gnu/store/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
+URL: nar/gzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
+Compression: gzip
+FileSize: 30872887
+URL: nar/lzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
+Compression: lzip
+FileSize: 18829088
+NarHash: sha256:10n3nv3clxr00c9cnpv6x7y2c66034y45c788syjl8m6ga0hbkwy
+NarSize: 94372664
+References: 05zlxc7ckwflz56i6hmlngr86pmccam2-pcre-8.42 …
+System: x86_64-linux
+Deriver: vi2jkpm9fd043hm0839ibbb42qrv5xyr-gimp-2.10.12.drv
+Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…
+```
+
+Notice that there are two occurrences of the `URL`, `Compression`, and
+`FileSize` fields: one for gzip, and one for lzip.  Old Guix instances
+will just pick the first one, gzip; newer Guix will pick whichever
+supported method provides the smallest `FileSize`, usually lzip.  This
+will make migration trivial in the future, should we add support for
+other compression methods.
+
+Users need to upgrade their Guix daemon to benefit from lzip.  On a
+“foreign distro”, simply run `guix pull` as root.  On standalone Guix
+systems, run `guix pull && sudo guix system reconfigure
+/etc/config.scm`.  In both cases, the daemon has to be restarted, be it
+with `systemctl restart guix-daemon.service` or with `herd restart
+guix-daemon`.
+
+# First impressions
+
+This new gzip+lzip scheme has been deployed on ci.guix.gnu.org for a
+week.  Specifically, we run `guix publish -C gzip:9 -C lzip:9`, meaning
+that we use the highest compression ratio for both compression methods.
+
+Currently, only a small subset of the package substitutes are available
+as both lzip and gzip; those that were already available as gzip have
+not been recompressed.  The following Guile program that taps into the
+API of [`guix
+weather`](https://www.gnu.org/software/guix/manual/en/html_node/Invoking-guix-weather.html)
+allows us to get some insight:
+
+```scheme
+(use-modules (gnu) (guix)
+             (guix monads)
+             (guix scripts substitute)
+             (srfi srfi-1)
+             (ice-9 match))
+
+(define all-packages
+  (@@ (guix scripts weather) all-packages))
+
+(define package-outputs
+  (@@ (guix scripts weather) package-outputs))
+
+(define (fetch-lzip-narinfos)
+  (mlet %store-monad ((items (package-outputs (all-packages))))
+    (return
+     (filter (lambda (narinfo)
+               (member "lzip" (narinfo-compressions narinfo)))
+             (lookup-narinfos "https://ci.guix.gnu.org"; items)))))
+
+(define (lzip/gzip-ratio narinfo)
+  (match (narinfo-file-sizes narinfo)
+    ((gzip lzip)
+     (/ lzip gzip))))
+
+(define (average lst)
+  (/ (reduce + 0 lst)
+     (length lst) 1.))
+```
+
+Let’s explore this at the
+[REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop):
+
+```scheme
+scheme@(guile-user)> (define lst
+                       (with-store s
+                         (run-with-store s (fetch-lzip-narinfos))))
+computing 9,897 package derivations for x86_64-linux...
+updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
+scheme@(guile-user)> (length lst)
+$4 = 2275
+scheme@(guile-user)> (average (map lzip/gzip-ratio lst))
+$5 = 0.7398994395478715
+```
+
+As of this writing, around 20% of the package substitutes are
+available as lzip, so take the following stats with a grain of salt.
+Among those, the lzip-compressed substitute is on average 26% smaller
+than the gzip-compressed one.  What if we consider only packages bigger
+than 5 MiB uncompressed?
+
+```scheme
+scheme@(guile-user)> (define biggest
+                       (filter (lambda (narinfo)
+                                 (> (narinfo-size narinfo)
+                                    (* 5 (expt 2 20))))
+                               lst))
+scheme@(guile-user)> (average (map lzip/gzip-ratio biggest))
+$6 = 0.5974238562384483
+scheme@(guile-user)> (length biggest)
+$7 = 440
+```
+
+For those packages, lzip yields substitutes that are 40% smaller on
+average.  Pretty nice!  Lzip decompression is slightly more
+CPU-intensive than gzip decompression, but downloads are
+bandwidth-bound, so the benefits clearly outweigh the costs.
+
+# Going forward
+
+The switch from gzip to lzip has the potential to make upgrades “feel”
+faster, and that is great in itself.
+
+Fundamentally though, we’ve always been looking in this project at
+peer-to-peer solutions with envy.  Of course, the main motivation is to
+have a community-supported and resilient infrastructure, rather than a
+centralized one, and that vision goes [hand-in-hand with reproducible
+builds](https://www.gnu.org/software/guix/blog/2017/reproducible-builds-a-status-update/).
+
+We started working on [an extension to publish and fetch
+substitutes](https://issues.guix.gnu.org/issue/33899) over
+[IPFS](https://ipfs.io/).  Thanks to its content-addressed nature, IPFS
+has the potential to further reduce the amount of data that needs to be
+downloaded on an upgrade.
+
+The good news is that IPFS developers are also [interested in working
+with package manager
+developers](https://github.com/ipfs/package-managers), and I bet
+there’ll be interesting discussions at [IPFS
+Camp](https://camp.ipfs.io/) in just a few days.  We’re eager to pursue
+our IPFS integration work, and if you’d like to join us and hack the
+good hack, [let’s get in
+touch!](https://www.gnu.org/software/guix/contact/)
+
+
+#### About GNU Guix
+
+[GNU Guix](https://www.gnu.org/software/guix) is a transactional package
+manager and an advanced distribution of the GNU system that [respects
+user
+freedom](https://www.gnu.org/distros/free-system-distribution-guidelines.html).
+Guix can be used on top of any system running the kernel Linux, or it
+can be used as a standalone operating system distribution for i686,
+x86_64, ARMv7, and AArch64 machines.
+
+In addition to standard package management features, Guix supports
+transactional upgrades and roll-backs, unprivileged package management,
+per-user profiles, and garbage collection.  When used as a standalone
+GNU/Linux distribution, Guix offers a declarative, stateless approach to
+operating system configuration management.  Guix is highly customizable
+and hackable through [Guile](https://www.gnu.org/software/guile)
+programming interfaces and extensions to the
+[Scheme](http://schemers.org) language.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]