myexperiment-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Myexperiment-discuss] Re: Towards a cyberinfrastructure for the biologi


From: Paolo Romano
Subject: [Myexperiment-discuss] Re: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges
Date: Thu, 21 Aug 2008 11:53:55 +0200

At 11:31 21/08/2008, Phillip Lord wrote:

>>>>> "KC" == Kei Cheung <address@hidden> writes:

  KC> If some journals are requiring raw data (e.g., microarray data) to be
  KC> submitted to a public data repository, I wonder if workflows that are
  KC> used to analyze the data should also be submitted to a public workflow
  KC> repository.

It's a nice idea but doesn't quite allow the same level of repeatability. Most
taverna workflows need updating periodically, as the services go offline or
change their interfaces. Even if they don't, they return different results as
the implementation changes.

Ultimately, you need to store more than the workflow to allow any degree of
repeatability. Still, it would be a good step forward which is no bad thing.

You are right, and I think this really is a serious problem not only with the workflow approach to data analysis,
but to all bioinformatics procedures.

We should find a way to fully describe a bioinformatics data analysis, by specifying, e.g., not only the tools used (software programme, databases involved, parameters used, I/O), but also a lot of meta information on them, like software version and implementation, residing operating system, database version, server software and related version and implementation, accessed site, date of accession, etc... All this information would support at least a better specification of the procedure, while repeatability of the analysis would still be difficult, due to the frequent update of databases and the difficulty in keeping
previous releases on-line.
At the same time, it would be nice to see how results of analysis can change after some time, when
new data is available in databases.

Paolo

Paolo Romano (address@hidden)
Bioinformatics
National Cancer Research Institute (IST)
Largo Rosanna Benzi, 10, I-16132, Genova, Italy
Tel: +39-010-5737-288  Fax: +39-010-5737-295





reply via email to

[Prev in Thread] Current Thread [Next in Thread]