Re: [Myexperiment-discuss] proposal: a "test and controls" section for e

myexperiment-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Myexperiment-discuss] proposal: a "test and controls" section for e

From:	Paolo Missier
Subject:	Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment
Date:	Sat, 18 Oct 2008 15:10:54 -0400
User-agent:	Thunderbird 2.0.0.17 (Macintosh/20080914)

Hi all,

the need for sanity checks of myExp workflows has been raised before,and not only in the context of the so-called "workflow decay monitor",and indeed I have long been an advocate of "packing" all the necessarytest data along with the workflow.I think this _is_ a topic for myExperiment, rather than for Taverna,because it involves making the published workflow self-consistent formultiple reasons:

- to clarify its intent. Not been an e-scientist myself, I can onlyguess that I/O examples are at least as informative to bioinformaticiansas a narrative description

- to verify that its current functionality is intact, in view ofpossible changes in the external services that the workflow depends on.Regression testing after changes (either intentional or not) has longbeen a key feature of SE practices.

- to verify functional correctness (i.e., no bugs). Notice that this maybe subtle: a correct T1 experiment may translate in some strange way toa T2 workflow (there are, as we know, backward compatibility issues)that is no longer "semantically" correct.

I don't see many other options. Adding exception management to SWcomponents is also good SW eng practice in general, but we can't expectusers to do that systematically by adding new processors, and as you(Paul) point out this may reduce the readability of the process. On theother hand, provider-supplied error messages may not be informative enough.Also, a feature/limitation of Taverna has always been its weak datatyping. Strong typing is used in programming languages to performsanity checks on input values, amongst other things. Without it, theability for a third party to verify functionality by testing on author'ssupplied I/O data becomes essential.

As a simple example of how the intended behaviour of a workflow may behidden in the guts of the process, here is a bit of a recent privateemail exchange with Alan and others on a myexp-published workflow:

regarding this workflow that I wrote about to Alan earlier:
http://www.myexperiment.org/workflows/166 (Retrieve SNPs fromregions around known genes)it works fine but not with the suggested input, which is an oldgeneID that is now archived.
I am running it with a list of two geneIDs:
ENSG00000139618
ENSG00000083093
I am not sure the dbSNP processor is wired correctly -- shouldn't ituse a dot iteration strategy? as it is it appears to pick up all thepossible combinations of chromosome nameone for each input gene, start positions (again one for each inputgene) and end positions (same again). In the end it outputs a complexnested list that I don't understand.
I changed the cross to dot product in dbSNP and now I get a much morereasonable 800+ SNPs, organized into two simple lists that make senseto me (I can send them if needed)
it also only takes 30 secs to run now...

it seems to me that there was a bug in the workflow (but I am not sure),which only came up because the output "didn't look right" to me (and Ibarely understand SNPs etc.).Having made test I/O available would have (a) spotted this and (b)avoided having me bother people to verify this.


So I believe this is indeed an important issue...

--Paolo


Paul Fisher wrote:

I see your point, but, this will in most cases be handeled directly bythe service provider. An example is to supply a gene identifier, whena protein identifier is needed. The service would recognise that youhave put the wrong id in, as it simply wont return any results, orreturn an error stating that the input was incorrect. This would be aservice-side means of error checking. On the other hand you COULD addin error checks for the entire workflow, but for an example of:
http://www.myexperiment.org/workflows/72
you can quickly see that the size of the workflow will be incrediblylarge, if each input is to be checked before it is passed to the nextservice.. Given that people choose to re-use not only on "if it works"but also on the size of the workflow, and many other things, then thismay result in workflow no longer being used as they are too big tounderstand.
Perhaps this discussion should move to the Taverna-Users list instead,as a feature for Taverna or a workflow best practice thought?!?!?!
I do know that a workflow decay monitor has been in production, andthere may be plans for its' integration into other projects, such asBioCatalogue. Not that I want to speculate here,
regards,
Paul.

[Prev in Thread]

Current Thread

[Next in Thread]

[Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Giovanni Marco Dall'Olio, 2008/10/18
- Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Paul Fisher, 2008/10/18
  - Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Giovanni Marco Dall'Olio, 2008/10/18
    - Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Paul Fisher, 2008/10/18
    - Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Giovanni Marco Dall'Olio, 2008/10/18
    - Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Paul Fisher, 2008/10/18
    - Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Giovanni Marco Dall'Olio, 2008/10/18
    - Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Paolo Missier <=
    - Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Giovanni Marco Dall'Olio, 2008/10/19
    - Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment, Giovanni Marco Dall'Olio, 2008/10/19

Prev by Date: Re: [Myexperiment-discuss] new talks
Next by Date: [Myexperiment-discuss] myExperiment policy about negative comments
Previous by thread: Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment
Next by thread: Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment
Index(es):
- Date
- Thread