[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] git-download: Speed up 'git-predicate'.
From: |
Christopher Baines |
Subject: |
Re: [PATCH] git-download: Speed up 'git-predicate'. |
Date: |
Sun, 16 Jul 2017 11:42:20 +0100 |
On Wed, 21 Jun 2017 23:44:26 +0200
address@hidden (Ludovic Courtès) wrote:
> Hello,
>
> Christopher Baines <address@hidden> skribis:
>
> > +(define (create-directory-tree files)
> > + (define (directory-lists->tree directory-lists)
> > + (map (lambda (top-level-dir)
> > + (cons top-level-dir
> > + (directory-lists->tree
> > + (filter-map
> > + (lambda (directory-list)
> > + (if (eq? (length directory-list) 1)
> > + #f
> > + (cdr directory-list)))
> > + ;; Find all the directory lists under this
> > top-level-dir
> > + (filter
> > + (lambda (directory-list)
> > + (equal? (car directory-list)
> > + top-level-dir))
> > + directory-lists)))))
> > + (delete-duplicates
> > + (map car directory-lists))))
> > +
> > + (directory-lists->tree
> > + (filter-map (lambda (path)
> > + (let ((split-path (string-split path #\/)))
> > + ;; If this is a file in the top of the
> > repository?
> > + (if (eq? (length split-path) 1)
> > + #f
> > + ;; drop-right to remove the filename, as
> > it's
> > + ;; just the directory tree that's important
> > + (drop-right (string-split path #\/) 1))))
> > + files)))
> > +
> > +(define (directory-in-tree? directory whole-tree)
> > + (define directory-list-in-tree?
> > + (match-lambda*
> > + (((top-directory . rest) tree)
> > + (if (null? rest)
> > + (list? (member top-directory (map car tree)))
> > + (and=> (find (match-lambda
> > + ((subtree-top-directory . subtree)
> > + (equal? subtree-top-directory
> > + top-directory)))
> > + tree)
> > + (match-lambda
> > + ((subtree-top-directory . subtree)
> > + (directory-list-in-tree? rest subtree))))))))
> > +
> > + (directory-list-in-tree? (string-split directory #\/)
> > + whole-tree))
>
> I tried it to fully understand what was going on. While playing with
> it I came up with a variant that is slightly more concise and clearer
> (at least to me ;-)):
>
> --8<---------------cut here---------------start------------->8---
> (define (files->directory-tree files)
> "Return a tree of vhashes representing the directory listed in
> FILES, a list like '(\"a/b\" \"b/c/d\")."
> (fold (lambda (file result)
> (let loop ((file (string-split file #\/))
> (result result))
> (match file
> ((_)
> result)
> ((directory children ...)
> (match (vhash-assoc directory result)
> (#f
> (vhash-cons directory (loop children vlist-null)
> result))
> ((_ . previous)
> ;; XXX: 'vhash-delete' is O(n).
> (vhash-cons directory (loop children previous)
> (vhash-delete directory result)))))
> (()
> result))))
> vlist-null
> files))
>
> (define (directory-in-tree? tree directory)
> "Return true if DIRECTORY, a string like \"a/b\", denotes a
> directory listed in TREE."
> (let loop ((directory (string-split directory #\/))
> (tree tree))
> (match directory
> (()
> #t)
> ((head . tail)
> (match (vhash-assoc head tree)
> ((_ . sub-tree) (loop tail sub-tree))
> (#f #f))))))
> --8<---------------cut here---------------end--------------->8---
Thanks for looking at this Ludo, and apologies for taking so long to
reply. These functions are definately more concise.
> The tree is a tree of vhash, which should make lookup (i.e.,
> ‘directory-in-tree?’) slightly faster. So it works like this:
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (files->directory-tree '("a" "a/b" "a/b/x" "a/c"
> "b" "b/c")) $17 = #<vhash 2802a40 2 pairs>
> scheme@(guile-user)> (directory-in-tree? $17 "a/b")
> $18 = #t
> scheme@(guile-user)> (directory-in-tree? $17 "a")
> $19 = #t
> scheme@(guile-user)> (directory-in-tree? $17 "a/b/x")
> $21 = #f
> scheme@(guile-user)> (directory-in-tree? $17 "a/c")
> $22 = #f
> scheme@(guile-user)> (directory-in-tree? $17 "b")
> $23 = #t
> --8<---------------cut here---------------end--------------->8---
>
> WDYT? If you like it, take it and adjust as you see fit. We’re doing
> 4-hand coding like whichever fancy methodology recommends. ;-)
I've had a little look at the performance, and I think this approach of
using a vhash might be slower, but not by much. The biggest difference
I saw for the guix repository was 0.008 seconds, and for smart-answers
was 0.6 seconds.
I'm happy to go ahead with either though, obviously these performance
differences don't matter compared to the current performance for large
repositories.
pgpWPry1QqzxC.pgp
Description: OpenPGP digital signature
- Re: [PATCH] git-download: Speed up 'git-predicate'.,
Christopher Baines <=