[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rb-general] Paper preprint: Reproducible genomics analysis pipeline
From: |
Ludovic Courtès |
Subject: |
Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix |
Date: |
Mon, 23 Apr 2018 10:20:26 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) |
Hello Ricardo & all!
Ricardo Wurmus <address@hidden> skribis:
> I’m happy to announce that the group I’m working with has released a
> preprint of a paper on reproducibility with the title:
>
> Reproducible genomics analysis pipelines with GNU Guix
> https://www.biorxiv.org/content/early/2018/04/11/298653
>
> We built a collection of bioinformatics pipelines and packaged them with
> GNU Guix, and then looked at the degree to which the software achieves
> bit-reproducibility (spoiler: ~98%), analysed sources of non-determinism
> (e.g. time stamps), discussed experimental reproducibility at runtime
> (e.g. random number generators, kernel+glibc interface, etc) and
> commented on the idea of using “containers” (or application bundles)
> instead.
Very impressive piece of work! I think it’s important to stress that
reproducible builds is a crucial foundation for reproducible
computational experiments, and this paper does a great job at this.
Also nice that you show you can have these bit-reproducible pipelines
formalized in Guix *and* produce a ready-to-use “container image.”
Hopefully we can soon address the remaining sources of non-determinism
shown in Table 3 (I think you already addressed some of them in the
meantime, didn’t you?).
The bit I’m less comfortable with is Autotools. I do understand how it
helps capture configure-time dependencies, and how it generally helps
people package and use the software; I think it’s one of the best tools
for the job. However it’s also hard to learn and, whether it’s
justified or not, it’s considered “scary.”
Given the intended audience, I wonder how we could provide a simpler
path to achieve the same goal. It could be a set of Autoconf macros
leading to high-level ‘configure.ac’ files without any line of shell
code, or it could be Guix interpreting a top-level .scm or JSON file,
both of which would ideally be easier to write for bioinformaticians.
What are your thoughts on this?
Anyway, kudos on this, thank you!
Ludo’.