guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] git-download: Speed up 'git-predicate'.


From: Christopher Baines
Subject: Re: [PATCH] git-download: Speed up 'git-predicate'.
Date: Sun, 16 Jul 2017 11:42:20 +0100

On Wed, 21 Jun 2017 23:44:26 +0200
address@hidden (Ludovic Courtès) wrote:

> Hello,
> 
> Christopher Baines <address@hidden> skribis:
> 
> > +(define (create-directory-tree files)
> > +  (define (directory-lists->tree directory-lists)
> > +    (map (lambda (top-level-dir)
> > +           (cons top-level-dir
> > +                 (directory-lists->tree
> > +                  (filter-map
> > +                   (lambda (directory-list)
> > +                     (if (eq? (length directory-list) 1)
> > +                         #f
> > +                         (cdr directory-list)))
> > +                   ;; Find all the directory lists under this
> > top-level-dir
> > +                   (filter
> > +                    (lambda (directory-list)
> > +                      (equal? (car directory-list)
> > +                              top-level-dir))
> > +                    directory-lists)))))
> > +         (delete-duplicates
> > +          (map car directory-lists))))
> > +
> > +  (directory-lists->tree
> > +   (filter-map (lambda (path)
> > +                 (let ((split-path (string-split path #\/)))
> > +                   ;; If this is a file in the top of the
> > repository?
> > +                   (if (eq? (length split-path) 1)
> > +                       #f
> > +                       ;; drop-right to remove the filename, as
> > it's
> > +                       ;; just the directory tree that's important
> > +                       (drop-right (string-split path #\/) 1))))
> > +               files)))
> > +
> > +(define (directory-in-tree? directory whole-tree)
> > +  (define directory-list-in-tree?
> > +    (match-lambda*
> > +      (((top-directory . rest) tree)
> > +       (if (null? rest)
> > +           (list? (member top-directory (map car tree)))
> > +           (and=> (find (match-lambda
> > +                          ((subtree-top-directory . subtree)
> > +                           (equal? subtree-top-directory
> > +                                   top-directory)))
> > +                        tree)
> > +                  (match-lambda
> > +                    ((subtree-top-directory . subtree)
> > +                     (directory-list-in-tree? rest subtree))))))))
> > +
> > +  (directory-list-in-tree? (string-split directory #\/)
> > +                           whole-tree))  
> 
> I tried it to fully understand what was going on.  While playing with
> it I came up with a variant that is slightly more concise and clearer
> (at least to me ;-)):
> 
> --8<---------------cut here---------------start------------->8---
> (define (files->directory-tree files)
>   "Return a tree of vhashes representing the directory listed in
> FILES, a list like '(\"a/b\" \"b/c/d\")."
>   (fold (lambda (file result)
>           (let loop ((file (string-split file #\/))
>                      (result result))
>             (match file
>               ((_)
>                result)
>               ((directory children ...)
>                (match (vhash-assoc directory result)
>                  (#f
>                   (vhash-cons directory (loop children vlist-null)
>                               result))
>                  ((_ . previous)
>                   ;; XXX: 'vhash-delete' is O(n).
>                   (vhash-cons directory (loop children previous)
>                               (vhash-delete directory result)))))
>               (()
>                result))))
>         vlist-null
>         files))
> 
> (define (directory-in-tree? tree directory)
>   "Return true if DIRECTORY, a string like \"a/b\", denotes a
> directory listed in TREE."
>   (let loop ((directory (string-split directory #\/))
>              (tree       tree))
>     (match directory
>       (()
>        #t)
>       ((head . tail)
>        (match (vhash-assoc head tree)
>          ((_ . sub-tree) (loop tail sub-tree))
>          (#f #f))))))
> --8<---------------cut here---------------end--------------->8---

Thanks for looking at this Ludo, and apologies for taking so long to
reply. These functions are definately more concise. 

> The tree is a tree of vhash, which should make lookup (i.e.,
> ‘directory-in-tree?’) slightly faster.  So it works like this:
> 
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (files->directory-tree '("a" "a/b" "a/b/x" "a/c"
> "b" "b/c")) $17 = #<vhash 2802a40 2 pairs>
> scheme@(guile-user)> (directory-in-tree? $17 "a/b")
> $18 = #t
> scheme@(guile-user)> (directory-in-tree? $17 "a")
> $19 = #t
> scheme@(guile-user)> (directory-in-tree? $17 "a/b/x")
> $21 = #f
> scheme@(guile-user)> (directory-in-tree? $17 "a/c")
> $22 = #f
> scheme@(guile-user)> (directory-in-tree? $17 "b")
> $23 = #t
> --8<---------------cut here---------------end--------------->8---
> 
> WDYT?  If you like it, take it and adjust as you see fit.  We’re doing
> 4-hand coding like whichever fancy methodology recommends.  ;-)

I've had a little look at the performance, and I think this approach of
using a vhash might be slower, but not by much. The biggest difference
I saw for the guix repository was 0.008 seconds, and for smart-answers
was 0.6 seconds.

I'm happy to go ahead with either though, obviously these performance
differences don't matter compared to the current performance for large
repositories.

Attachment: pgpWPry1QqzxC.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]