Re: search for better regtest comparison algorithm

lilypond-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: search for better regtest comparison algorithm

From:	Luca Fascione
Subject:	Re: search for better regtest comparison algorithm
Date:	Mon, 29 Jul 2024 12:10:26 +0200

Hi Han-Wen,
I was actually thinking about this situation upside-down from how you're
seeing it,
details below

On Mon, Jul 29, 2024 at 10:30 AM Han-Wen Nienhuys <hanwenn@gmail.com> wrote:

> On Mon, Jul 29, 2024 at 8:56 AM Luca Fascione <l.fascione@gmail.com>
> wrote:
> > [shifts are] going to be some random, non-integer quantity, right?
>
> Yes, but since the comparison works on pixel images, you can't see the
> non-integer part of the shift.

> Also, the rasterization that gets performed, is it anti aliased?
>
> Usually it is; we could turn it off and possibly make the images
> larger, but IIRC that slows things down.
>

Actually my thought here is that if you have antialiasing on and you
translate the image
my a small amount in x and/or y, you alter _all_ the non-white pixels: this
is because the
renderer will account for the coverage of each object wrt each pixel
slightly differently,
and change all the shades of gray it generates because of this.
If you render a square that is 1 pxl to the side aligned with the raster
you get one black pixel.
But if you translate it half pixel right and down you get 4 pixels each
0.25 grey (after
unapplying gamma, assuming you're using "box 1 1" filtering).
If you're trying to realign one image to the other in this scenario, you
can see it'll be quite annoying to
do this and recover one image from the other (in my example going from
first image to second would
work but going from second to first won't, the heart of the problem is that
you're double-convolving with
antialiasing filter, instead of deconvolving).

> > It would seem that though shifts and changes in the lengths of the
> staves are "common", small and relatively benign problems, rotations and
> scales (magnifications) should be considered major disasters, right?
>
> Rotations do not generally happen. Virtually all the positioning is
> rectilinear, and scaling is also not common. What happens that objects
> end up in different locations, and sometimes variable objects (slurs,
> beams) have different sizes.
>

Yes I meant the case that this would happen as a result of new defects
introduced in the code,
not as requests from the source.
In other words: if lilypond emits everything rotated 0.2 degrees clockwise,
some major coding disaster has certainly happened, and tests should fail
loudly.
On the contrary, if half the staves in a sheet move up by 0.35 pixel, this
is probably
because the bbox of a glyph is now a touch tighter, or the placing of an
articulation mark
is slightly different, all of which could in general be considered
relatively benign.
Not necessarily desirable, but certainly not a complete "all is on fire"
situation either.

Very agreed on the suggestion elsewhere that was brought up that tests
should be small.
I must admit I'm not familiar with how specifically lilypond is tested, but
the ideal situation is

   - each test runs quickly
   - each test demonstrates a very small amount of features (ideally one or
   two)
   - the verification checks only specific aspects of the output (a test
   that renders articulations, should not check that the console output
   reports the right version of lilypond, for example)
   - this is useful, because in many cases it'll let you use fairly coarse
   thresholds for accept/reject, in that the checking part should have a
   wildly different outcomes from "good" vs "bad"

This will give you hundreds of tests.
Running hundreds of tests will takes a long time. This is usually not
something most people look forward to dealing with.

So once you have the above, you add hierarchies to the above so you can
deploy a branch-and-bound strategy

   - Make bigger tests that check several things at once (these are
   probably approximately the tests you have now, I suspect)
   - These will fail using much tighter acceptance criteria (if they pass
   you're very sure it's all good)
   - When these "supertests" fail, the inner tests they cover are run, and
   a report is made containing the outcome of those
   - When these "supertests" pass, the inner tests are skipped: this is
   where you get major time savings

But I digress.

-- 
Luca Fascione

[Prev in Thread]

Current Thread

[Next in Thread]

Re: search for better regtest comparison algorithm, (continued)
- Re: search for better regtest comparison algorithm, Thomas Morley, 2024/07/28

Prev by Date: Re: search for better regtest comparison algorithm
Next by Date: Re: search for better regtest comparison algorithm
Previous by thread: Re: search for better regtest comparison algorithm
Next by thread: Re: search for better regtest comparison algorithm
Index(es):
- Date
- Thread