guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use guix to distribute data & reproducible (data) science


From: Amirouche Boubekki
Subject: Re: Use guix to distribute data & reproducible (data) science
Date: Sat, 10 Feb 2018 09:51:41 +0000



On Fri, Feb 9, 2018 at 8:16 PM Konrad Hinsen <address@hidden> wrote:
Hi,

On 09/02/2018 18:13, Ludovic Courtès wrote:

> Amirouche Boubekki <address@hidden> skribis:
>
>> tl;dr: Distribution of data and software seems similar.
>>         Data is more and more important in software and reproducible
>>         science. Data science ecosystem lakes resources sharing.
>>         I think guix can help.
>
> Now, whether Guix is the right tool to distribute data, I don’t know.
> Distributing large amounts of data is a job in itself, and the store
> isn’t designed for that.  It could quickly become a bottleneck.  That’s
> one of the reasons why the Guix Workflow Language (GWL) does not store
> scientific data in the store itself.

and then distributed via standard channels (Zenodo, ...)

Thanks for the pointer!
 
For big datasets, some other mechanism is required.

Big as in bigger than ram?
 
I think it's worth thinking carefully about how to exploit guix for
reproducible computations. As Lispers know very well, code is data and
data is code. Building a package is a computation like any other.
 
What I was thinking about, is use guix to distribute data packages just like
we distribute softwares from pypi. The advantage of using guix seems obvious,
but apparantly it's not desirable or possible and I don't understand why.

Scientific workflows could be handled by a specific build system. In
fact, as long as no big datasets or multiple processors are involved, we
can do this right now, using standard package declarations.

Ok, good to know.
 
It would be nice if big datasets could conceptually be handled in the
same way while being stored elsewhere - a bit like git-annex does for
git.

Thanks again for the pointer.

And for parallel computing, we could have special build daemons.

That's where OWL comes in?
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]