guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Guix on clusters and in HPC


From: myglc2
Subject: Re: Guix on clusters and in HPC
Date: Mon, 31 Oct 2016 20:11:45 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

On 10/26/2016 at 14:00 Ludovic Courtès writes:

> myglc2 <address@hidden> skribis:
>
>> The scheduler that I am most familiar with, SGE, supports the
>> proposition that compute hosts are heterogeneous and that they each have
>> a fixed software and/or hardware configuration. As a result, users need
>> to specify resources, such as SW packages &/or #CPUs &/or memory needed
>> for a given job. These requirements in turn control where a given job
>> can run. QMAKE, the integration of GNU Make with the SGE scheduler,
>> further allows a make recipe step to specify specific resources for a
>> SGE job to process the make step.
>
> I see.
>
>> While SGE is dated and can be a bear to use, it provides a useful
>> yardstick for HPC/Cluster functionality. So it is useful to consider how
>> Guix(SD) might impact this model. Presumably a defining characteristic
>> of GuixSD clusters is that the software configuration of compute hosts
>> no longer needs to be fixed and the user can "dial in" a specific SW
>> configuration for each job step.  This is in many ways a good thing. But
>> it also generates new requirements. How does one specify the SW config
>> for a given job or recipe step:
>>
>> 1) VM image?
>>
>> 2) VM?
>>
>> 3) Installed System Packages?
>>
>> 4) Installed (user) packages?
>
> The ultimate model here would be that of offloading¹: users would use
> Guix on their machine, compute the derivation they want to build
> locally, and offload the actual build to the cluster.  In turn, the
> cluster would schedule builds on the available and matching compute
> nodes.  But of course, this is quite sophisticated.
>
> ¹
> https://www.gnu.org/software/guix/manual/html_node/Daemon-Offload-Setup.html

Thanks for pointing me to this. I hadn't internalized (an probably
don't yet understand) just how cool the Offload Facility is. Sorry if my
earlier comments were pendantic or uninformed as a result :-(

Considering the Offload Facility as a SGE replacement, it would be
interesting to make a venn diagram of the SGE and Guix Offload Facility
functions and to study the usability issues of each. Having failed that
assignment, here are a few thoughts:

I guess we would see the QMAKE makefile mentioned above as being
replaced by guile recipe(s)?  Maybe we need a cheat sheet showing how to
map between the two sets of functions/concepts?

I am a little unclear about the implications of placing all analysis
results into the Store. In labs where I have worked, data sources and
destinations are typically managed by project.  What are the pros and
cons of everything in the store? What are the management/maintenance
issues? E.g, when any result can be reproducibly derived, then only the
inputs are precious, but maybe we want to "protect" from GC those
results that were computationally more expensive?

In Grid Engine, "submit hosts" are the machines that a user logs into to
gain access to the cluster. Usually there are one or two such hosts,
often used, in part, to simplify cluster access control. I guess you are
saying that, when every user machine is set up as an "offload facility,"
it becomes like a "submit host." In general, this would be "nicer" but
not sufficient. Grid Engine also provides a 'qrsh' command that allows
users to log into compute host(s), reserving the same resources as
required by a given job. This is useful when debugging a process that is
failing or prototyping a process that requires memory, CPUs, or other
resources not available on the user machine. Can the offload facility be
extended to support something like this?

> A more directly usable approach is to simply let users manage profiles
> on the cluster using ‘guix package’ or ‘guix environment’.  Then they
> can specify the right profile or the right ‘guix environment’ command in
> their jobs.

This seems quite powerful. How would one reproducibly specify "which
guix" version [to use | was used]?

Does this fit within the offload facility harness?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]