[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Suggestion: disable offloading for texlive builds on hydra?
From: |
Mark H Weaver |
Subject: |
Re: Suggestion: disable offloading for texlive builds on hydra? |
Date: |
Sun, 26 Oct 2014 12:07:13 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) |
address@hidden (Ludovic Courtès) writes:
> John Darrington <address@hidden> skribis:
>
>> On Sun, Oct 26, 2014 at 03:36:03AM -0400, Mark H Weaver wrote:
>> When texlive is built on hydra, the build slave that built it is tied up
>> for 12 hours or more waiting for the build outputs (over 3 gigabytes!)
>> to be transferred back to hydra.
>>
>> By design, only one transfer can happen at a time from a given build
>> slave, so during those 12 hours, the build slave's CPU is left idle, and
>> typically another 3 built-but-not-yet-transferred packages must wait
>> until the texlive transfer finishes.
>>
>> Why is it designed like that? It seems like a poor design to me.
>
> The rationale was that, in general, you just slow everything down by
> sending several things at once.
I have my doubts that it would slow things down very much, if at all.
The number of parallel transfers would still be limited to a small
number, typically 4 per build slave. The expense associated with
running multiple processes on a CPU is mainly due to cache effects, but
I wouldn't expect that to be an issue with network connections,
especially when those connections are between the same two hosts. The
practice of using multiple connections is well established in web
browsers and imap clients, as long as the number is not too large.
We're losing a huge amount of available CPU capacity in our build farm
(probably over 30 machine-hours per texinfo rebuild) in exchange for a
dubious increase in network efficiency.
The more I think about it, the more I agree with John that we've chosen
the wrong tradeoff here. I think we should remove those mutexes.
> diff --git a/gnu/packages/texlive.scm b/gnu/packages/texlive.scm
> index e562b02..bc0ece7 100644
> --- a/gnu/packages/texlive.scm
> +++ b/gnu/packages/texlive.scm
> @@ -88,7 +88,7 @@
> ("pkg-config" ,pkg-config)
> ("python" ,python-2) ; incompatible with Python 3 (print syntax)
> ("tcsh" ,tcsh)))
> - (outputs '("out" "data"))
> + (outputs '("out" "data" "doc"))
> (arguments
> `(#:out-of-source? #t
> #:configure-flags
>
>
> Data point: there’s 1.6 GiB in texmf-dist/doc (which the patch above
> splits out), and 1.4 GiB in texmf-dist/fonts.
I'd definitely be in favor of splitting out the docs.
> Another option Andreas and I discussed a while back would be to use a
> fixed-output derivation for the data, since it’s really what it is.
> That’s a bit hacky though: we’d have to install it, compute the hash of
> the installed files, and then use that as the derivation’s output hash.
Hmm. It is indeed a hack, but maybe worth considering. When I think
about Guix users downloading over 3 GiB from our humble hydra quite
often just to have TeX, it makes me worry about our bandwidth
requirements.
Thanks,
Mark