On Sat, Oct 18, 2008 at 3:34 PM, Paul Fisher
<address@hidden
<mailto:address@hidden>> wrote:
How then do you propose to test the workflows, other than to
download them and see if they work.
Do you mean examples of their use, along with experimental results
and a publication, to prove they work?
I am sympatizer of this phylosophy, that says that you should write
test units before writing the code:
- http://www.extremeprogramming.org/rules/testfirst.html
That means that for every script I write, I first create testing sets,
and that I don't consider my programs as working until they don't pass
all the test correctly.
There are basically two kind of tests you can write for
bioinformatics: those that verify that your programs don't contain
errors, and those that you run each time to prove that your are using
the program correctly.
So, in myExperiment I would add a section that explains all the tests
that have be ran to ensure that the workflows is written ok.
That would be the first thing I'll check when I want to choose if
re-use a workflow or not.
In this section, you should add a list of all the tests, their
description (like the one I put in the first mail of this thread), and
their results, along with the necessary input data.
Other people will be able to re-run the tests on their computers and
tell if they succeed.
For example: I publish a workflow on myExperiment, that make use of
ncbi blast.
Then, the ncbi xml interface changes, but I don't notice that, so I
don't update my workflow.
The next time somebody else wants to re-use my workflow, he should be
able to re-run the same tests with the same input files to see if he
obtains the same exact results, and know that there is something wrong
if a different result is returned.
Second, I would add a section for the test that should be used to
demonstrate that you are using the workflow correctly.
Let's say you publish a workflow which has an input called 'fasta file'.
This would mean that the workflow needs a fasta file as input; but
maybe I could mis-understand your description, and put the literary
string 'fasta file' as input.
Your workflow should contain a processor that checks that the input
file is ok, and an output that should say 'input fasta file is not
ok!' if it is not.
That would be the run-time test.
Test are not always example files, but usually one always write at
least a test with such inputs.
I am not a programming guru myself, but I hope I have been able to
explain what I want to say.
Or
Do you want someone to upload example inputs for (as stated in my
last email).
Either way, you would only truely know if they worked if you
tested it for yourself, as with any other program!
Paul.
Giovanni Marco Dall'Olio wrote:
On Sat, Oct 18, 2008 at 3:00 PM, Paul Fisher
<address@hidden
<mailto:address@hidden>
<mailto:address@hidden
<mailto:address@hidden>>> wrote:
Hi,
I understood what you were trying to say in your email, but I'm
not sure it came across properly. I think you may have
confused a
few people with cross-discipline vocabulary :)
I hope I have this right in a summing up statement:
/You need examples of what the workflow should work on, more
precisely: inputs and outputs
/
No.
I am talking about software testing
(http://en.wikipedia.org/wiki/Software_testing).
If I want to re-use a workflow from myExperiment, I want
something that proves me that it works correctly.
It is a good scientific practice. As you won't use a
laboratory instrument that is not calibrate, you won't use a
program that is not tested.
I don't know how to explain it better that how I did in my
last mail. Maybe you can read this article:
-
http://www.americanscientist.org/issues/pub/wheres-the-real-bottleneck-in-scientific-computing/1
I think this has already been mentioned before - more precisely
work on attachments (which I'm very keen to see personally).
Correct me if i'm wrong though people.
regards,
Paul.
Giovanni Marco Dall'Olio wrote:
Hi,
I think you should add a section where to describe
'Test and
Controls' in the 'Detailed view' for every workflow in
myExperiment.
What do I mean?
Protocols and Pipelines are always tested, in experimental
biology.
For example, let's say you want to design a new
protocols for
extracting DNA from blood samples.
You will have to spend much of the time on ideating
controls
that will allow me to demonstrate that my protocol is good.
You will need to demonstrate that PCR amplification doesn't
amplify contaminations, you'll have to calibrate all the
instruments, put control and comparison samples.
The same goes for any bioinformatics workflow. A
pipeline for
a scientific experiment should follow all the good
laboratory
practices, it doesn't matter if the instruments used are
physical machineries or bioinformatics tools.
For example, I am going to write a script to calculate a
statistics on a big amount of data.
Up to now I have thought of three tests:
- the workflow should fail if wrong input files are given
- the workflow should give me the right result when I
ran it
on testing data for which I already know the statistics
value.
- If I create two random sets of sequences, one with more
variablity than the other, the workflow should give me an
higher output value for the first set than for the
second one.
You should add a section where people can write with which
kind of tests their workflows have been calibrated.
Eventually
you should put two sections, one with the tests that have
already been executed, and one for the ones one should run
each time he is using the workflow.
I think such a section would be very useful in
myExperiment.
Moreover, these test could also act as examples, so
workflows
will be easier to understand for other users.
I believe testing workflow is a very good practice, that
unfortunately not many bioinformaticists are used to do :(.
You should distinguish the workflows that provide tests
description from the others, so people will be able to
suggest
how to design tests to people that are not used to do that.
--
-----------------------------------------------------------
My Blog on Bioinformatics (italian): http://bioinfoblog.it
------------------------------------------------------------------------
_______________________________________________
Myexperiment-discuss mailing list
address@hidden
<mailto:address@hidden>
<mailto:address@hidden
<mailto:address@hidden>>
http://lists.nongnu.org/mailman/listinfo/myexperiment-discuss
--
-----------------------------------------------------------
My Blog on Bioinformatics (italian): http://bioinfoblog.it
------------------------------------------------------------------------
_______________________________________________
Myexperiment-discuss mailing list
address@hidden
<mailto:address@hidden>
http://lists.nongnu.org/mailman/listinfo/myexperiment-discuss
--
-----------------------------------------------------------
My Blog on Bioinformatics (italian): http://bioinfoblog.it