[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Automated testing for users' LilyPond collections with new developme
From: |
Jean Abou Samra |
Subject: |
Re: Automated testing for users' LilyPond collections with new development versions |
Date: |
Wed, 30 Nov 2022 23:44:02 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 |
Le 28/11/2022 à 23:49, Karlin High a écrit :
This message intrigued me:
<https://lists.gnu.org/archive/html/lilypond-devel/2022-11/msg00222.html>
In it, Eric Benson reported a setup that allows testing new versions
of LilyPond on a sizable body of work in a somewhat automated fashion.
Now, could automation like that also make use of the infrastructure
for LilyPond's regression tests?
<http://lilypond.org/doc/v2.23/Documentation/contributor/regtest-comparison>
What effort/value would there be in making an enhanced convert-ly tool
that tests a new version of LilyPond on a user's entire collection of
work, reporting differences between old and new versions in
performance and output?
Enabling something like this:
* New release of LilyPond comes out. Please test.
* Advanced users with large collections of LilyPond files do the
equivalent of "make test-baseline," but for their collection instead
of LilyPond's regtests. Elapsed time is recorded, also CPU and RAM
info as seems good.
* New LilyPond gets installed
* Upgrade script runs convert-ly on the collection, first offering
backup via convert-ly options or tarball-style.
* Equivalent of "make check" runs
* A report generates, optionally as email to lilypond-devel, with
summary of regression test differences and old-vs-new elapsed time.
Ideally, this could quickly produce lots of good testing info for
development versions of LilyPond, in a way encouraging user
participation.
How much work: I don't know. Nonzero, probably not big.
Keep in mind, however, that on a regular basis, there is a change that
generates lots of small differences, so you are likely to get mostly
noise from a comparison like this. You can only really do it between
consecutive unstable releases, because if you compare the last stable
release with the current unstable release (assuming that a few unstable
releases have passed since the stable one), the noise will likely be
overwhelming. For this reason, the testers need to be really dedicated.
Best,
Jean
OpenPGP_signature
Description: OpenPGP digital signature