myexperiment-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Myexperiment-discuss] Re: Towards a cyberinfrastructure for the biologi


From: Kei Cheung
Subject: [Myexperiment-discuss] Re: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges
Date: Thu, 21 Aug 2008 18:07:27 -0400
User-agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)

Paolo Romano wrote:

At 11:31 21/08/2008, Phillip Lord wrote:

>>>>> "KC" == Kei Cheung <address@hidden> writes:

KC> If some journals are requiring raw data (e.g., microarray data) to be KC> submitted to a public data repository, I wonder if workflows that are KC> used to analyze the data should also be submitted to a public workflow
  KC> repository.

It's a nice idea but doesn't quite allow the same level of repeatability. Most taverna workflows need updating periodically, as the services go offline or change their interfaces. Even if they don't, they return different results as
the implementation changes.

Ultimately, you need to store more than the workflow to allow any degree of repeatability. Still, it would be a good step forward which is no bad thing.


You are right, and I think this really is a serious problem not only with the workflow approach to data analysis,
but to all bioinformatics procedures.

We should find a way to fully describe a bioinformatics data analysis, by specifying, e.g., not only the tools used (software programme, databases involved, parameters used, I/O), but also a lot of meta information on them, like software version and implementation, residing operating system, database version, server software and related version and implementation, accessed site, date of accession, etc... All this information would support at least a better specification of the procedure, while repeatability of the analysis would still be difficult, due to the frequent update of databases and the difficulty in keeping
previous releases on-line.
At the same time, it would be nice to see how results of analysis can change after some time, when
new data is available in databases.

Paolo

Paolo Romano (address@hidden)
Bioinformatics
National Cancer Research Institute (IST)
Largo Rosanna Benzi, 10, I-16132, Genova, Italy
Tel: +39-010-5737-288  Fax: +39-010-5737-295

Since the data (e.g., genome annotation) used in an analysis pipeline (workflow) may evolve over time, part of the provenance of the workflow may need to include the version of the data (besides raw data) involved in the analysis.

-Kei




reply via email to

[Prev in Thread] Current Thread [Next in Thread]