Re: [Myexperiment-discuss] proposal: a "test and controls" section for e

On Sat, Oct 18, 2008 at 4:17 PM, Paul Fisher <address@hidden> wrote:

I see your point, but, this will in most cases be handeled directly by the service provider. An example is to supply a gene identifier, when a protein identifier is needed. The service would recognise that you have put the wrong id in, as it simply wont return any results, or return an error stating that the input was incorrect. This would be a service-side means of error checking. On the other hand you COULD add in error checks for the entire workflow, but for an example of:

http://www.myexperiment.org/workflows/72

Well, what I am asking about is that every workflow should have a clear description of all the checks and tests that are made for every workflow and of those that are not checked, especially if they are done server side.
You should not rely always on server side tests - sometimes you have to check the things by yourself.

you can quickly see that the size of the workflow will be incredibly large, if each input is to be checked before it is passed to the next service..

Given that people choose to re-use not only on "if it works" but also on the size of the workflow

Come on.. a good bioinformatician should never use a workflow if he can't be sure that it works correctly. Please try to understand what I was saying in the previous mails.

Perhaps this discussion should move to the Taverna-Users list instead, as a feature for Taverna or a workflow best practice thought?!?!?!

I was also thinking about this.

I do know that a workflow decay monitor has been in production, and there may be plans for its' integration into other projects, such as BioCatalogue. Not that I want to speculate here,

regards,
Paul.

Giovanni Marco Dall'Olio wrote:

On Sat, Oct 18, 2008 at 3:34 PM, Paul Fisher <address@hidden <mailto:address@hidden>> wrote:

How then do you propose to test the workflows, other than to
download them and see if they work.
Do you mean examples of their use, along with experimental results
and a publication, to prove they work?

I am sympatizer of this phylosophy, that says that you should write test units before writing the code:
- http://www.extremeprogramming.org/rules/testfirst.html

That means that for every script I write, I first create testing sets, and that I don't consider my programs as working until they don't pass all the test correctly.

There are basically two kind of tests you can write for bioinformatics: those that verify that your programs don't contain errors, and those that you run each time to prove that your are using the program correctly.

So, in myExperiment I would add a section that explains all the tests that have be ran to ensure that the workflows is written ok.
That would be the first thing I'll check when I want to choose if re-use a workflow or not.
In this section, you should add a list of all the tests, their description (like the one I put in the first mail of this thread), and their results, along with the necessary input data.
Other people will be able to re-run the tests on their computers and tell if they succeed.
For example: I publish a workflow on myExperiment, that make use of ncbi blast.
Then, the ncbi xml interface changes, but I don't notice that, so I don't update my workflow.
The next time somebody else wants to re-use my workflow, he should be able to re-run the same tests with the same input files to see if he obtains the same exact results, and know that there is something wrong if a different result is returned.

Second, I would add a section for the test that should be used to demonstrate that you are using the workflow correctly.
Let's say you publish a workflow which has an input called 'fasta file'.
This would mean that the workflow needs a fasta file as input; but maybe I could mis-understand your description, and put the literary string 'fasta file' as input.
Your workflow should contain a processor that checks that the input file is ok, and an output that should say 'input fasta file is not ok!' if it is not.
That would be the run-time test.

Test are not always example files, but usually one always write at least a test with such inputs.
I am not a programming guru myself, but I hope I have been able to explain what I want to say.

Or

Do you want someone to upload example inputs for (as stated in my
last email).

Either way, you would only truely know if they worked if you
tested it for yourself, as with any other program!

Paul.

Giovanni Marco Dall'Olio wrote:

On Sat, Oct 18, 2008 at 3:00 PM, Paul Fisher
<address@hidden
<mailto:address@hidden>
<mailto:address@hidden
<mailto:address@hidden>>> wrote:

Hi,

I understood what you were trying to say in your email, but I'm
not sure it came across properly. I think you may have
confused a
few people with cross-discipline vocabulary :)

I hope I have this right in a summing up statement:

/You need examples of what the workflow should work on, more
precisely: inputs and outputs
/

No.
I am talking about software testing
(http://en.wikipedia.org/wiki/Software_testing).
If I want to re-use a workflow from myExperiment, I want
something that proves me that it works correctly.
It is a good scientific practice. As you won't use a
laboratory instrument that is not calibrate, you won't use a
program that is not tested.

I don't know how to explain it better that how I did in my
last mail. Maybe you can read this article:
-
http://www.americanscientist.org/issues/pub/wheres-the-real-bottleneck-in-scientific-computing/1

I think this has already been mentioned before - more precisely
work on attachments (which I'm very keen to see personally).
Correct me if i'm wrong though people.

regards,
Paul.

Giovanni Marco Dall'Olio wrote:

Hi,
I think you should add a section where to describe
'Test and
Controls' in the 'Detailed view' for every workflow in
myExperiment.

What do I mean?
Protocols and Pipelines are always tested, in experimental
biology.
For example, let's say you want to design a new
protocols for
extracting DNA from blood samples.
You will have to spend much of the time on ideating
controls
that will allow me to demonstrate that my protocol is good.
You will need to demonstrate that PCR amplification doesn't
amplify contaminations, you'll have to calibrate all the
instruments, put control and comparison samples.

The same goes for any bioinformatics workflow. A
pipeline for
a scientific experiment should follow all the good
laboratory
practices, it doesn't matter if the instruments used are
physical machineries or bioinformatics tools.

For example, I am going to write a script to calculate a
statistics on a big amount of data.
Up to now I have thought of three tests:
- the workflow should fail if wrong input files are given
- the workflow should give me the right result when I
ran it
on testing data for which I already know the statistics
value.
- If I create two random sets of sequences, one with more
variablity than the other, the workflow should give me an
higher output value for the first set than for the
second one.

You should add a section where people can write with which
kind of tests their workflows have been calibrated.
Eventually
you should put two sections, one with the tests that have
already been executed, and one for the ones one should run
each time he is using the workflow.
I think such a section would be very useful in
myExperiment.
Moreover, these test could also act as examples, so
workflows
will be easier to understand for other users.

I believe testing workflow is a very good practice, that
unfortunately not many bioinformaticists are used to do :(.
You should distinguish the workflows that provide tests
description from the others, so people will be able to
suggest
how to design tests to people that are not used to do that.

-- -----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it
------------------------------------------------------------------------

_______________________________________________
Myexperiment-discuss mailing list
address@hidden
<mailto:address@hidden>
<mailto:address@hidden
<mailto:address@hidden>>

http://lists.nongnu.org/mailman/listinfo/myexperiment-discuss

-- -----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it
------------------------------------------------------------------------

_______________________________________________
Myexperiment-discuss mailing list
address@hidden
<mailto:address@hidden>
http://lists.nongnu.org/mailman/listinfo/myexperiment-discuss

--
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

--
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From:	Giovanni Marco Dall'Olio
Subject:	Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment
Date:	Sat, 18 Oct 2008 16:29:25 +0200