[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Optimizing union.scm
From: |
Mark H Weaver |
Subject: |
Re: Optimizing union.scm |
Date: |
Tue, 25 Mar 2014 18:30:42 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) |
address@hidden (Ludovic Courtès) writes:
> Mark H Weaver <address@hidden> skribis:
>
>> Here's a first draft of a new union.scm.
>
> Thanks for being so fast!
>
>> On my YeeLoong 8101B, when building my 155-package profile, it uses
>> about 1/14 as much CPU time (25 seconds vs 6 minutes), and about 2/7 as
>> much real time (2 minutes vs about 7.17 minutes). It also handles
>> libffi in core-updates properly. A few notes:
>
> That’s good news, and definitely an incentive to switch to this
> implementation.
>
>> * For now, I'm using a hash table, because it turned out to be rather
>> awkward to use a vhash for this. I can make it purely functional
>> later if it's deemed important.
>
> I would prefer it, but it must not be blocking.
Okay, I'll work on it.
>> * I've not yet updated tests/union.scm, which tests internal procedures
>> in union.scm that no longer exist.
>
> Right, the ‘tree-union’ and ‘delete-duplicate-leaves’ tests will have to
> be removed.
>
> However, could you add similar tests? They will have to use the real
> ‘union-build’, though. Maybe the ‘with-file-tree’ can be useful to
> write the tests.
Could you help with this? I'm a bit overloaded right now.
[...]
> Some stylistic comments:
>
>> (define (names-in-directory dirname)
>> (let ((dir (opendir dirname)))
>> (let loop ((filenames '()))
>> (match (readdir dir)
>> ((or "." "..")
>> (loop filenames))
>> ((? eof-object?)
>> (closedir dir)
>> filenames)
>> (name
>> (loop (cons name filenames)))))))
>
> Rather use something like:
>
> (scandir directory (lambda (file)
> (not (member file '("." "..")))))
>
> ‘scandir’ also has the advantage of being deterministic (it sorts
> entries.)
I looked at 'scandir', but it's very wasteful. For one thing, it
unconditionally calls 'lstat' on every directory entry, which entails
over ten thousand unnecessary 'lstat' system calls, a lot of unnecessary
I/O to read the inodes, the use of a vhash to keep track of where it has
been to detect cycles (since it's based on 'file-system-fold'), etc.
>> (define (directory? name)
>> (eq? 'directory (stat:type (stat name))))
>
> Rather ‘file-is-directory?’.
Okay.
>> (define* (union-build output inputs
>> #:key (log-port (current-error-port)))
>
> Docstring.
Oops, I removed yours by mistake.
>> (define (make-link input output)
>
> Maybe ‘symlink*’? (‘make-link’ sounds like it returns a newly-allocated
> object.)
Okay.
>> (format log-port "`~a' ~~> `~a'~%" input output)
>> (symlink input output))
>>
>> (define (union output inputs)
>> (match inputs
>> ((input)
>> ;; There's only one input, so just make a link.
>> (make-link input output))
>> (_
>> (receive (dirs files) (partition directory? inputs)
>
> Rather SRFI-11 let-values.
Hmm. In simple cases like this, I find 'receive' much more attractive
than 'let-values', since the latter would involve triple-nested parens,
which is a bit much even for me.
Can you explain why you prefer 'let-values' to 'receive' in this case?
Also, how do you feel about 'call-with-values'? In an earlier draft of
this code, I used 'call-with-values' with 'match-lambda*' as the
consumer, which eliminated the need for the 'cond'. Would you prefer
that?
>> (cond
>> ((null? files)
>>
>> ;; All inputs are directories. Create a new directory
>> ;; where we will merge the input directories.
>> (mkdir output)
>>
>> ;; Build a hash table mapping each name to a list of input
>> ;; directories containing that name.
>> (let ((table (make-hash-table)))
>>
>> (define (add-to-table! name dir)
>> (let ((handle (hash-create-handle! table name '())))
>> (set-cdr! handle (cons dir (cdr handle)))))
>
> Rather ‘hash-set!’ (‘set-cdr!’ is doomed, you know ;-)).
That would double the number of hash table lookups. Anyway, I plan to
eliminate the use of the hash table altogether, so the point is moot :)
>> (format (current-error-port)
>> "warning: collision encountered: ~{~a ~}~%"
>> files)
>>
>> ;; TODO: Implement smarter strategies.
>> (format (current-error-port)
>> "warning: arbitrarily choosing ~a~%"
>> file)
>
> Can we keep that as a separate, possibly inner ‘resolve-collisions’
> procedure? That makes it clear what to extend, when we want to
> implement that (the original idea was that ‘union-build’ could be passed
> a collision-resolution procedure.)
>
>> (make-link file output))))
>>
>> (else
>> ;; The inputs are a mixture of files and directories
>> (error "union-build: collision between file and directories"
>> `((files ,files) (dirs ,dirs)))))))))
>
> Same for this one.
Okay.
Thanks for the quick review :)
Mark
- Re: Problems with handicapped 'bash' from glibc package, Ludovic Courtès, 2014/03/23
- Re: Problems with handicapped 'bash' from glibc package, Mark H Weaver, 2014/03/23
- Re: Problems with handicapped 'bash' from glibc package, Ludovic Courtès, 2014/03/23
- Optimizing union.scm, Mark H Weaver, 2014/03/23
- Re: Optimizing union.scm, Ludovic Courtès, 2014/03/24
- Re: Optimizing union.scm, Mark H Weaver, 2014/03/25
- Re: Optimizing union.scm, Ludovic Courtès, 2014/03/25
- Re: Optimizing union.scm,
Mark H Weaver <=
- Re: Optimizing union.scm, Ludovic Courtès, 2014/03/25
- Re: Optimizing union.scm, Mark H Weaver, 2014/03/27
- Re: Optimizing union.scm, Ludovic Courtès, 2014/03/27
Re: Problems with handicapped 'bash' from glibc package, Ludovic Courtès, 2014/03/26