Re: Use guix to distribute data & reproducible (data) science

guix-devel

[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use guix to distribute data & reproducible (data) science

From:	Amirouche Boubekki
Subject:	Re: Use guix to distribute data & reproducible (data) science
Date:	Sat, 10 Feb 2018 09:51:41 +0000

On Fri, Feb 9, 2018 at 8:16 PM Konrad Hinsen <address@hidden> wrote:

Hi,

On 09/02/2018 18:13, Ludovic Courtès wrote:

> Amirouche Boubekki <address@hidden> skribis:
>
>> tl;dr: Distribution of data and software seems similar.
>> Data is more and more important in software and reproducible
>> science. Data science ecosystem lakes resources sharing.
>> I think guix can help.
>
> Now, whether Guix is the right tool to distribute data, I don’t know.
> Distributing large amounts of data is a job in itself, and the store
> isn’t designed for that. It could quickly become a bottleneck. That’s
> one of the reasons why the Guix Workflow Language (GWL) does not store
> scientific data in the store itself.

and then distributed via standard channels (Zenodo, ...)

Thanks for the pointer!

For big datasets, some other mechanism is required.

Big as in bigger than ram?

I think it's worth thinking carefully about how to exploit guix for
reproducible computations. As Lispers know very well, code is data and
data is code. Building a package is a computation like any other.

What I was thinking about, is use guix to distribute data packages just like

we distribute softwares from pypi. The advantage of using guix seems obvious,

but apparantly it's not desirable or possible and I don't understand why.

Scientific workflows could be handled by a specific build system. In
fact, as long as no big datasets or multiple processors are involved, we
can do this right now, using standard package declarations.

Ok, good to know.

It would be nice if big datasets could conceptually be handled in the
same way while being stored elsewhere - a bit like git-annex does for
git.

Thanks again for the pointer.

And for parallel computing, we could have special build daemons.

That's where OWL comes in?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Use guix to distribute data & reproducible (data) science, (continued)
- Re: Use guix to distribute data & reproducible (data) science, Ludovic Courtès, 2018/02/09
  - Re: Use guix to distribute data & reproducible (data) science, zimoun, 2018/02/09
  - Re: Use guix to distribute data & reproducible (data) science, Konrad Hinsen, 2018/02/09
    - Re: Use guix to distribute data & reproducible (data) science, zimoun, 2018/02/09
    - Re: Use guix to distribute data & reproducible (data) science, Ricardo Wurmus, 2018/02/09
    - Re: Use guix to distribute data & reproducible (data) science, Konrad Hinsen, 2018/02/12
    - Do you use packages in Guix to run neural networks?, Fis Trivial, 2018/02/13
    - Re: Do you use packages in Guix to run neural networks?, Pjotr Prins, 2018/02/14
    - Re: Do you use packages in Guix to run neural networks?, Fis Trivial, 2018/02/14
    - Re: Do you use packages in Guix to run neural networks?, Konrad Hinsen, 2018/02/14
    - Re: Use guix to distribute data & reproducible (data) science, Amirouche Boubekki <=
    - Re: Use guix to distribute data & reproducible (data) science, zimoun, 2018/02/10
    - Re: Use guix to distribute data & reproducible (data) science, Ludovic Courtès, 2018/02/14
    - Re: Use guix to distribute data & reproducible (data) science, zimoun, 2018/02/15
    - Re: Use guix to distribute data & reproducible (data) science, Konrad Hinsen, 2018/02/16
    - Re: Use guix to distribute data & reproducible (data) science, myglc2, 2018/02/16
    - Re: Use guix to distribute data & reproducible (data) science, Konrad Hinsen, 2018/02/16
    - Re: Use guix to distribute data & reproducible (data) science, Amirouche Boubekki, 2018/02/16
- Re: Use guix to distribute data & reproducible (data) science, Amirouche Boubekki, 2018/02/16
  - Re: Use guix to distribute data & reproducible (data) science, Roel Janssen, 2018/02/17
  - Re: Use guix to distribute data & reproducible (data) science, Ludovic Courtès, 2018/02/18

Prev by Date: Re: Call for project proposals: Guix and Outreachy
Next by Date: Re: Call for project proposals: Guix and Outreachy
Previous by thread: Re: Do you use packages in Guix to run neural networks?
Next by thread: Re: Use guix to distribute data & reproducible (data) science
Index(es):
- Date
- Thread