bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24496: offloading should fall back to local build after n tries


From: Ludovic Courtès
Subject: bug#24496: offloading should fall back to local build after n tries
Date: Wed, 05 Oct 2016 13:36:20 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

ng0 <address@hidden> skribis:

> Ludovic Courtès <address@hidden> writes:

[...]

>> Like you say, on Hydra-style setup this could be a problem: the
>> front-end machine may have --max-jobs=0, meaning that it cannot perform
>> builds on its own.
>>
>> So I guess we would need a command-line option to select a different
>> behavior.  I’m not sure how to do that because ‘guix offload’ is
>> “hidden” behind ‘guix-daemon’, so there’s no obvious place for such an
>> option.
>
> Could the daemon run with --enable-hydra-style or --disable-hydra-style
> and --disable-hydra-style would allow falling back to local build if
> after a defined time - keeping slow connections in mind - the machine
> did not reply.

That would be too ad-hoc IMO, and the problem mentioned above remains.

>> In the meantime, you could also hack up your machines.scm: it would
>> return a list where unreachable machines have been filtered out.
>
> How can I achieve this?

Something like:

  (define the-machine (build-machine …))

  (if (managed-to-connect-timely the-machine)
      (list the-machine)
      '())

… where ‘managed-to-connect-timely’ would try to connect to the
machine with a timeout.

> And to append to this bug: it seems to me that offloading requires 1
> lsh-key for each
> build-machine.

The main machine needs to be able to connect to each build machine over
SSH, so indeed, that requires proper SSH key registration (host keys and
authorized user keys).

> (https://lists.gnu.org/archive/html/help-guix/2016-10/msg00007.html)
> and that you can not directly address them (say I want to create some
> system where I want to build on machine 1 AND machine 2. Having 2
> x86_64 in machines.scm only selects one of them (if 2 were working,
> see linked thread) and builds on the one which is accessible first. If
> however the first machine is somehow blocked and it fails, therefore
> terminates lsh connection, the build does not happen at all.

The code that selects machines is in (guix scripts offload),
specifically ‘choose-build-machine’.  It tries to choose the “best”
machine, which means, roughly, the fastest and least loaded one.

HTH,
Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]