Re: [Myexperiment-discuss] proposal: a "test and controls" section for e

On Sat, Oct 18, 2008 at 3:34 PM, Paul Fisher <address@hidden> wrote:

How then do you propose to test the workflows, other than to download them and see if they work.
Do you mean examples of their use, along with experimental results and a publication, to prove they work?

I am sympatizer of this phylosophy, that says that you should write test units before writing the code:
- http://www.extremeprogramming.org/rules/testfirst.html

That means that for every script I write, I first create testing sets, and that I don't consider my programs as working until they don't pass all the test correctly.

There are basically two kind of tests you can write for bioinformatics: those that verify that your programs don't contain errors, and those that you run each time to prove that your are using the program correctly.

So, in myExperiment I would add a section that explains all the tests that have be ran to ensure that the workflows is written ok.
That would be the first thing I'll check when I want to choose if re-use a workflow or not.
In this section, you should add a list of all the tests, their description (like the one I put in the first mail of this thread), and their results, along with the necessary input data.
Other people will be able to re-run the tests on their computers and tell if they succeed.
For example: I publish a workflow on myExperiment, that make use of ncbi blast.
Then, the ncbi xml interface changes, but I don't notice that, so I don't update my workflow.
The next time somebody else wants to re-use my workflow, he should be able to re-run the same tests with the same input files to see if he obtains the same exact results, and know that there is something wrong if a different result is returned.

Second, I would add a section for the test that should be used to demonstrate that you are using the workflow correctly.
Let's say you publish a workflow which has an input called 'fasta file'.
This would mean that the workflow needs a fasta file as input; but maybe I could mis-understand your description, and put the literary string 'fasta file' as input.
Your workflow should contain a processor that checks that the input file is ok, and an output that should say 'input fasta file is not ok!' if it is not.
That would be the run-time test.

Test are not always example files, but usually one always write at least a test with such inputs.
I am not a programming guru myself, but I hope I have been able to explain what I want to say.

Or

Do you want someone to upload example inputs for (as stated in my last email).

Either way, you would only truely know if they worked if you tested it for yourself, as with any other program!

Paul.

Giovanni Marco Dall'Olio wrote:

On Sat, Oct 18, 2008 at 3:00 PM, Paul Fisher <address@hidden <mailto:address@hidden>> wrote:

Hi,

I understood what you were trying to say in your email, but I'm
not sure it came across properly. I think you may have confused a
few people with cross-discipline vocabulary :)

I hope I have this right in a summing up statement:

/You need examples of what the workflow should work on, more
precisely: inputs and outputs
/

No.
I am talking about software testing (http://en.wikipedia.org/wiki/Software_testing).
If I want to re-use a workflow from myExperiment, I want something that proves me that it works correctly.
It is a good scientific practice. As you won't use a laboratory instrument that is not calibrate, you won't use a program that is not tested.

I don't know how to explain it better that how I did in my last mail. Maybe you can read this article:
- http://www.americanscientist.org/issues/pub/wheres-the-real-bottleneck-in-scientific-computing/1

I think this has already been mentioned before - more precisely
work on attachments (which I'm very keen to see personally).
Correct me if i'm wrong though people.

regards,
Paul.

Giovanni Marco Dall'Olio wrote:

Hi,
I think you should add a section where to describe 'Test and
Controls' in the 'Detailed view' for every workflow in
myExperiment.

What do I mean?
Protocols and Pipelines are always tested, in experimental
biology.
For example, let's say you want to design a new protocols for
extracting DNA from blood samples.
You will have to spend much of the time on ideating controls
that will allow me to demonstrate that my protocol is good.
You will need to demonstrate that PCR amplification doesn't
amplify contaminations, you'll have to calibrate all the
instruments, put control and comparison samples.

The same goes for any bioinformatics workflow. A pipeline for
a scientific experiment should follow all the good laboratory
practices, it doesn't matter if the instruments used are
physical machineries or bioinformatics tools.

For example, I am going to write a script to calculate a
statistics on a big amount of data.
Up to now I have thought of three tests:
- the workflow should fail if wrong input files are given
- the workflow should give me the right result when I ran it
on testing data for which I already know the statistics value.
- If I create two random sets of sequences, one with more
variablity than the other, the workflow should give me an
higher output value for the first set than for the second one.

You should add a section where people can write with which
kind of tests their workflows have been calibrated. Eventually
you should put two sections, one with the tests that have
already been executed, and one for the ones one should run
each time he is using the workflow.
I think such a section would be very useful in myExperiment.
Moreover, these test could also act as examples, so workflows
will be easier to understand for other users.

I believe testing workflow is a very good practice, that
unfortunately not many bioinformaticists are used to do :(.
You should distinguish the workflows that provide tests
description from the others, so people will be able to suggest
how to design tests to people that are not used to do that.

-- -----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it
------------------------------------------------------------------------

_______________________________________________
Myexperiment-discuss mailing list
address@hidden
<mailto:address@hidden>

http://lists.nongnu.org/mailman/listinfo/myexperiment-discuss

--
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it
------------------------------------------------------------------------

_______________________________________________
Myexperiment-discuss mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/myexperiment-discuss

--
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it

From:	Giovanni Marco Dall'Olio
Subject:	Re: [Myexperiment-discuss] proposal: a "test and controls" section for experiments in myExperiment
Date:	Sat, 18 Oct 2008 16:04:55 +0200