guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Prototype register-path


From: Ludovic Courtès
Subject: Re: [PATCH] Prototype register-path
Date: Thu, 08 Jun 2017 18:59:52 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

Hi reepca,

Caleb Ristvedt <address@hidden> skribis:

> address@hidden (Ludovic Courtès) writes:

[...]

>> “make check TESTS=tests/store.scm” passes now with the fixes you posted
>> today (‘getenv’ & co.)
>
> I end up with 2 failed tests, but neither of them are the register-path
> one. 

Could you send tests/store.log?

>> For deduplication, you already know the (guix hash) API that will allow
>> you to do that I guess.  The “deduplication database” is simply the
>> /gnu/store/.links directory.  Each entry there is a hard link to a file;
>> the name of the entry is the hex representation of the sha256 hash of
>> that file.  So when inserting a file in the store, all you need is to
>> look up its hash in /gnu/store/.links and hard-link from there.  See
>> what I mean?
>
> And if it isn't in /gnu/store/.links, put a hardlink there? Also, when
> you say "when inserting a file", what do you mean by that? I was under
> the impression that the paths given to register-path were already in the
> store but simply weren't in the database or otherwise "officially" in
> the store. Is my assumption incorrect?

Your assumption is correct, sorry for the confusion.

‘register-path’ gets called when the files are already in the store and
the deduplication phase therefore happens when the files are already in
the store.  This can be seen in guix-register.cc, which does:

  store->registerValidPaths (infos);
  if (optimize)
    store->optimisePath (prefix + i.path);

I suppose we could add a #:optimize? parameter to ‘register-path’, which
would default to #t.

> Less on-topic, I don't really know how sha256 works, but if it is a
> hashing function, there must be a possibility (albeit small) of
> collisions, right? Is it worth developing a strategy for handling them?

No.  (There’s a paper called “Compare-by-Hash: A Reasoned Analysis” by
John Black that discusses this, if you want to read more.  :-))

>> Some minor suggestions about the code:
>
> Implemented all of these except maybe half of the commit log one... I'm
> not sure what the commit message headline should be, so I sort of
> guessed based on some of the ones in recent history, and I'm not sure
> I've got the formatting quite right yet, but it should be a lot closer.

Awesome!  AFAICS everything is in place, so I’ll just comment on the
cosmetic side of things; it’s kinda boring but it’s the sort of details
that will allow us to quickly integrate your code, so I think it
matters.

As discussed previously, I think it would make sense to move the
deduplication code to a new (guix store deduplication) module.

  ;; TODO: handling in case the .links directory doesn't exist? For now I'll
  ;; just assume it's the responsibility of whoever makes the store to create
  ;; it.
  (define (deduplicate path store hash)
    "Checks if a store item with hash HASH already exists. If so, replaces PATH
  with a hardlink to the already-existing one. If not, it registers PATH so that
  future duplicates can hardlink to it."
    (let ((links-path (string-append store
                                     "/.links/"
                                     (bytevector->base16-string hash))))
      (if (file-exists? links-path)
          (replace-with-link links-path path)
          (link path links-path))))

Here you could add (mkdir-p links-directory), it doesn’t hurt (‘mkdir-p’
comes from (guix build utils)).  I would change the signature to:

  (define* (deduplicate file hash #:optional (store %store-directory))
    …)

Regarding this:

  (define (get-temp-link target)
    "Like mkstemp!, but instead of creating a new file and giving you the name,
  it creates a new hardlink to TARGET and gives you the name."
    (let try-again ((tempname (tmpnam)))
      (catch
        #t
        (lambda ()
          (link target tempname)
          tempname)
        (lambda ()
          (try-again (tmpnam))))))

I think ‘get-temp-link’ should do the same as ‘optimizePath_’ in the C++
code, which is this:

    Path tempLink = (format("%1%/.tmp-link-%2%-%3%")
        % settings.nixStore % getpid() % rand()).str();

The reason is that this temporary file must live on the same file system
as the store (link(2) fails with EXDEV when call for files that live on
different file systems).

As is it written, the ‘deduplicate’ procedure does not handle the ENOSPC
and EMLINK cases that ‘optimizePath_’ handles.  Would be nice to catch
them.  To check for these specific error cases, you can do:

  (catch 'system-error
    (lambda ()
      (link a b))
    (lambda args
      (cond ((= EMLINK (system-error-errno args)) …)
            …)))

Other things:

  • Conceptually (guix store) should not import (gnu build install),
    which lives on a higher layer.  So I would suggest moving
    ‘reset-timestamps’ to (guix store deduplication) and have (gnu build
    install) use (guix store deduplication).

  • The convention in GNU is to use the term “file name” for file names
    (I often write ‘file’ as the variable name), and to use “path” only
    for search paths like $PATH, etc.  There are a few places where
    ‘register-path’ & co. use ‘path’ instead of ‘file’.

  • In databases.scm, the headers lacks the first line.  :-)

Your commit logs are good BTW.  :-)  For changes in store.scm, I would
use the “store:” prefix in the subject line rather than “guix:” to
clarify what part of the code it touches (you can check “git log
guix/store.scm” to get an idea), but that’s about it.

>> What about pushing your changes to a WIP branch on Savannah or
>> elsewhere?  (If you have an account on Savannah we can give you access.)
>
> I just made an account today as "reepca". Should I submit a "request for
> inclusion"? In the meantime, here's a much less fixup-riddled patch set
> that I think is just about feature-complete as far as register-path is
> concerned:

Excellent.  By the time you’ll read this email I should have added you
on Savannah.

Please make sure the OpenPGP key you’ll use to sign your commits is at
<https://savannah.gnu.org/users/reepca>, and send a reply to this
message signed with your key.

Also please read ‘HACKING’ about commit access to the repo.  I think
you’ll be working mostly on a branch, say, ‘scheme-daemon’.  You can
push your changes on the new branch to begin with.

A possible next task would be to check whether code that uses the
‘register-path’ procedure still works after this change.  That includes
things like ‘guix pack -f --localstatedir foo’ and also ‘guix system
disk-image’.  A few things will need to be adjusted to pull in the new
modules and guile-sqlite3.

Alternately, you could move on to other areas, such as the functionality
roughly corresponding to libstore/build.cc.

Anyway, great work, thank you!

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]