[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#68741] [PATCH 0/6] Content-addressed downloads from Software Herita
From: |
Ludovic Courtès |
Subject: |
[bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage |
Date: |
Fri, 26 Jan 2024 18:25:37 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Oops, I forgot to Cc: the fine people for the cover letter; fixed!
See <https://issues.guix.gnu.org/68741>.
Ludovic Courtès <ludo@gnu.org> skribis:
> Hello Guix!
>
> For those who’ve been following along, you might remember that the
> main impedance mismatch between SWH and Guix is that SWH uses Git
> tree SHA1 hashes to identify directories whereas Guix uses nar SHA256
> hashes (and possibly other hash functions in the future):
>
>
> https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/
>
> Because of this, the SWH fallback path for ‘git-download’ had two
> options:
>
> 1. If ‘git-reference’ specifies a full SHA1 commit ID, it would
> look it up on SWH and fetch it.
>
> 2. If ‘git-reference’ specifies a tag, which is perhaps the
> majority of cases, Guix would ask SWH the commit that once
> corresponded to that tag at that URL, and then fetch it.
>
> Case #1 is ideal: it’s content-addressed. Case #2 is brittle: we’re
> hoping that the tag hasn’t been modified and that the URL hasn’t been
> reused for something else; if that’s not the case, SWH might return
> the “wrong” commit and we end up fetching something unrelated.
>
> The good news is that our friends at SWH have just deployed a new
> version of their code that lets us look up directories by some
> “external identifier” (“ExtID”), among which there’s ‘nar-sha256’:
>
> https://archive.softwareheritage.org/api/1/extid/doc/
>
> And that, my friends, makes a huge difference: the impedance mismatch
> is gone, we can now use content-addressing to fetch our stuff from SWH!!
> And that works not just for Git, but also for Mercurial, SVN, CVS, etc.
>
> Well, there’s a caveat: currently the ‘nar-sha256’ is added only on
> new visits and it’s apparently not being added yet for Mercurial for
> unclear reasons. So right now, we can get guile-sqlite3 0.1.3 (Git) by
> nar-sha256, but we cannot get guile-wisp (hg) nor in fact most things.
> That’ll improve over time though, and SWH comrades are open to adding
> those ExtIDs retroactively.
>
> The patches that follow do several things:
>
> 1. Follow redirects in the Vault: (guix swh) previously did not
> do that (oops!) but the newly-deployed Vault now responds with
> 302 redirects so we have to handle that.
>
> 2. Add bindings for the ExtID HTTP interface.
>
> 3. Add ‘swh-download-directory-by-nar-hash’, which does what it
> says.
>
> 4. Use that as the preferred fallback method for ‘git-fetch’.
>
> Here’s a REPLshot:
>
> scheme@(guile-user)> (lookup-external-id "nar-sha256"
> (content-hash-value(origin-hash (package-source (@ (gnu packages guile)
> guile-sqlite3)))) )
> $43 = #<<external-id> value:
> "0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63" type:
> "nar-sha256" version: 0 target:
> "swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153" target-url:
> "https://archive.softwareheritage.org/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153">
> scheme@(guile-user)> (swh-download-directory-by-nar-hash
> (content-hash-value(origin-hash (package-source (@ (gnu packages guile)
> guile-sqlite3)))) 'sha256 "/tmp/gsql")
> SWH: found directory with nar-sha256 hash
> 0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63 at
> 'swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153'
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/.gitignore
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/AUTHORS
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING.LESSER
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ChangeLog
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/Makefile.am
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/NEWS
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/README
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/guile.am
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/test-driver.scm
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/configure.ac
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/env.in
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/sqlite3.scm.in
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/basic.scm
> $46 = #t
>
> Huge thanks to everyone over at #swh-devel for helping me out
> over the past few days!
>
> Next tasks: implement download fallback for ‘hg-fetch’, change
> ‘guix lint -c archival’ to make ‘save-origin’ requests not just
> for Git repos, assess the situation with SVN and sub-directories
> to see what can be done.
>
> Thoughts?
>
> Ludo’.
>
> PS: Apologies for the wall of text!
>
> Ludovic Courtès (6):
> swh: ‘vault-fetch’ follows redirects.
> swh: Add bindings for the “ExtID” API.
> swh: Add ‘swh-download-directory-by-nar-hash’.
> lint: archival: Check with ‘lookup-directory-by-nar-hash’.
> git-download: Download from SWH by nar hash when possible.
> swh: Fix docstring of ‘lookup-directory’.
>
> guix/build/git.scm | 20 ++++--
> guix/git-download.scm | 4 +-
> guix/lint.scm | 28 +++++---
> guix/scripts/perform-download.scm | 4 +-
> guix/swh.scm | 113 ++++++++++++++++++++++++++----
> tests/lint.scm | 33 +++++++--
> tests/swh.scm | 21 +++++-
> 7 files changed, 189 insertions(+), 34 deletions(-)
>
>
> base-commit: 8bee6bb9aaaf35c36fe325675d1eb2daebd69c25
- [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage, Ludovic Courtès, 2024/01/26
- [bug#68741] [PATCH 1/6] swh: ‘vault-fetch’ follows redirects., Ludovic Courtès, 2024/01/26
- [bug#68741] [PATCH 2/6] swh: Add bindings for the “ExtID” API., Ludovic Courtès, 2024/01/26
- [bug#68741] [PATCH 3/6] swh: Add ‘swh-download-directory-by-nar-hash’., Ludovic Courtès, 2024/01/26
- [bug#68741] [PATCH 5/6] git-download: Download from SWH by nar hash when possible., Ludovic Courtès, 2024/01/26
- [bug#68741] [PATCH 4/6] lint: archival: Check with ‘lookup-directory-by-nar-hash’., Ludovic Courtès, 2024/01/26
- [bug#68741] [PATCH 6/6] swh: Fix docstring of ‘lookup-directory’., Ludovic Courtès, 2024/01/26
- [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage,
Ludovic Courtès <=