lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Automated GUI testing, revisited


From: Greg Chicares
Subject: Re: [lmi] Automated GUI testing, revisited
Date: Wed, 12 Nov 2014 01:55:21 +0000
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 2014-11-09 01:21Z, Greg Chicares wrote:
> Now that we have an automated GUI test, I'm starting this new thread
> to ask some questions about what it's intended to do, and to discuss
> where we should go from here.
> 
> I created a 'wx_test.conf' from an old off-list message, and hoped
> this would cause more tests to pass, but I still get:
>   FAILURE: 11 out of 24 tests failed
> (full output below[0]). I understand that the 'benchmark_census'
> won't pass until I get copies of the (proprietary) files it uses

Now I have those files. I still get
  FAILURE: 11 out of 24 tests failed.
but at least the "benchmark" test fails later:
  benchmark_census: ERROR (Assertion 'std::fabs(diff_in_percents) < 10' failed
  (expected 11000ms, but actually took 456ms, i.e. -95.85%).
  [file /lmi/src/lmi/wx_test_benchmark_census.cpp, line 101] )

Vadim--I believe I copied 'wx_test.conf' from an old email that
you had sent. Was "time_disk=11000" an actual measurement? For
single tasks, your machine should be faster than mine; so I'm
wondering whether we're running the same test.

My "456ms" above looks a lot like "time_run" in 'wx_test.conf':
  time_run=434
  time_disk=11000
  time_spreadsheet=710
(I mention that without further investigation on my part).

Now I'm wondering whether 'wx_test.conf' is really necessary.
Its main purpose is to name input files that use proprietary
products, but it appears that in practice we're using it for
files that match 'MSEC_*.cns', which could just as easily be
distinguished by pattern. (We might make that 'wx_test_*.cns
for clarity.)

It also includes timing data, but I'm not sure that's the best
approach. For system testing, we create about 1400 files--a
hundred MB of data--and store 'md5sum *' in a 1400-line file.
The latest md5sum file, and the last one known to be good, are
saved with fixed names and locations to make it each to compare
them, so we can easily tell whether anything has changed. When
anything in the 100 MB dataset differs, it's analyzed (the
logic's in the makefiles) and any material differences are
described on stdout or stderr; a message notifies us if nothing
differs.

Sometimes, results change (e.g., because we corrected an error),
or tests are added or removed, and comparing md5sum files tells
us what changed. When we've validated the changes, the current
md5sum file becomes the new last-known-good touchstone. Thus,
I ran the system test three times on 20141022 and saved the
results in these files:
  md5sums-20141022T2055Z
  md5sums-20141022T2100Z
  md5sums-20141022T2107Z
and I also have results that Kim sent me:
  md5sums-20141022T1722Z-kmm
My middle test matched hers perfectly. My middle one matched
except for one file that was added or removed...and so on.
Just last week she and I were comparing current results for a
particular set of tests to a touchstone saved years ago. We're
very comfortable with this workflow, and it has served us well.

I think a similar workflow for 'wx_test' makes the most sense.
We'll run 'wx_test' often (maybe whenever we run a system test),
and save a series of results, e.g. as 'wx_test-20141022T2055Z'
etc. to parallel 'md5sums-20141022T2055Z' above. Comparing a new
file side by side against a saved touchstone[0] lets us see what
changed, e.g.
+ MSEC_0.cns  run=434 disk=11000 spreadsheet=710
- MSEC_0.cns  run=441 disk=11022 spreadsheet=708
If we feel it's necessary, we can read those values and compare
them numerically, naming those that differ by more than a given
tolerance, or stating that none differ materially. That can be
done in an external program, so we don't need timings like this
  time_run=434
  time_disk=11000
  time_spreadsheet=710
in 'wx_test.conf'. But then the configuration file isn't needed
at all.

---------

[0] "Comparing a new file side by side against a saved touchstone"

Vadim--In the office, we use a shareware tool, but I'm slowly
preparing to move to a free OS; do you have any recommendation
for a free (as in freedom) GUI file-comparison tool? Here are
a few ideas:
  http://stackoverflow.com/questions/112932/graphical-diff-programs-for-linux




reply via email to

[Prev in Thread] Current Thread [Next in Thread]