At 11:31 21/08/2008, Phillip Lord wrote:
>>>>> "KC" == Kei Cheung <address@hidden> writes:
KC> If some journals are requiring raw data (e.g., microarray data)
to be
KC> submitted to a public data repository, I wonder if workflows
that are
KC> used to analyze the data should also be submitted to a public
workflow
KC> repository.
It's a nice idea but doesn't quite allow the same level of
repeatability. Most
taverna workflows need updating periodically, as the services go
offline or
change their interfaces. Even if they don't, they return different
results as
the implementation changes.
Ultimately, you need to store more than the workflow to allow any
degree of
repeatability. Still, it would be a good step forward which is no bad
thing.
You are right, and I think this really is a serious problem not only
with the workflow approach to data analysis,
but to all bioinformatics procedures.
We should find a way to fully describe a bioinformatics data analysis,
by specifying, e.g., not only
the tools used (software programme, databases involved, parameters
used, I/O), but also a lot of
meta information on them, like software version and implementation,
residing operating system,
database version, server software and related version and
implementation, accessed site, date of accession, etc...
All this information would support at least a better specification of
the procedure, while repeatability of
the analysis would still be difficult, due to the frequent update of
databases and the difficulty in keeping
previous releases on-line.
At the same time, it would be nice to see how results of analysis can
change after some time, when
new data is available in databases.
Paolo
Paolo Romano (address@hidden)
Bioinformatics
National Cancer Research Institute (IST)
Largo Rosanna Benzi, 10, I-16132, Genova, Italy
Tel: +39-010-5737-288 Fax: +39-010-5737-295