lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Pagination anomaly


From: Vadim Zeitlin
Subject: Re: [lmi] Pagination anomaly
Date: Tue, 6 Feb 2018 01:16:02 +0100

On Mon, 5 Feb 2018 22:04:02 +0000 Greg Chicares <address@hidden> wrote:

GC> Are you generating these 'sample*' product files, as opposed to simply
GC> extracting them from some tarball in lmi.nongnu.org's ftp area?

 I am afraid I don't even actually remember where did I take these ones
from, but I do know that I've edited the file in my data directory manually
because I wanted to see how did this field appear.

GC> If you're running lmi's [non-auto]make files, then these 'sample*'
GC> files are generated and installed, and should have a current date,
GC> which is good; if they date from the last millennium, that's a problem.

 The date of the original copy of the file I used was 2017-04-27, so it's
not quite that old, and comparing it with sample.policy in /opt/lmi/data in
my lmi VM, where the official build system is used, the only difference is
in the copyright year, so this time I was lucky. But I'll definitely think
to update them more regularly in the future, thanks.


GC> I think it's better for me to "apply your pull-request" (or "merge
GC> your branch", or however a git maven would express that).

 "Merge" works with both "PR" and "branch", and I definitely prefer if you
could please do it like this, this makes things simpler for me (and,
hopefully, not at your expense). I post small patches here just to show
them to you immediately, saving you a click on the URL, but I definitely
don't want to imply that you should be applying them by copy-and-pasting
them from email, like we did in the last millennium!

GC> > Unfortunately this code is not easy to test in isolation, so I need to
GC> > either refactor it in some non-trivial and not really natural way to allow
GC> > it, or we need to implement the text output idea from the other thread, 
but
GC> > do it in a way allowing to test the pagination logic. The latter would
GC> > probably be more useful, wouldn't it?
GC> 
GC> This is a hard question. Superficially, yes, of course it would be
GC> beneficial to have text output available as a side effect. But the
GC> real question--how to ensure that pagination is always correct--may
GC> be orthogonal to that. For example, flat-text output blocks contain
GC> an integral number of lines, but a pdf renderer probably needs to
GC> deal with fractional measurements:

 Actually it doesn't, at least not currently.

GC> In some ideal sense it seems that it ought to be possible to abstract
GC> out the pagination code into a small, easily-testable routine. But
GC> AFAICS that would essentially mean taking a vector of the vertical
GC> sizes of a set of blocks, and comparing its total to the available
GC> page size.
GC>   P = Page length
GC>   H = Header (vertical) size
GC>   F = Footer size
GC>   Q = size of each Quinquennial group of table elements
GC> Then the problem is determining the maximum N such that
GC>   P - H - F >= N*Q
GC> with due regard to padding between blocks.

 Yes, it's indeed as simple as that and looks totally trivial. As you can
see, this didn't prevent me from making at least 4 errors in it :-(

GC> >  Second problem is that currently there are 2 functions dealing with page
GC> > breaks: page_with_tabular_report::render() and get_extra_pages_needed().
GC> > This gives me twice as many opportunities to introduce bugs in them (as if
GC> > I needed it), so I wonder if I should remove the latter and just run the
GC> > code from render() twice, once with output disabled, to just compute the
GC> > number of pages needed, and second time for real.
GC> 
GC> Again, I haven't studied this code, but from my naive ivory-tower
GC> perspective this sounds like a Surpassingly Good Idea.

 Yes, I start coming to this conclusion too, even if it means making the
code somewhat more complicated.

GC> > One disadvantage of doing it like this is that
GC> > get_extra_pages_needed() is currently more efficient than render(),
GC> > but I'm not sure if it's really noticeable.
GC> 
GC> Can you measure it easily?

 Well, I couldn't until today, because I didn't have any version of
render() that didn't do any output, but just computed the number of pages.
But after writing my ad hoc test for these functions today, I now can. And,
as usual with performance measurements, they surprised me: running my test
program with 40000000 calls to each function took 0.3s for the version
directly computing the number of pages, as get_extra_pages_needed() does,
and 5.4s for the version doing it by iterating over all years, as render()
does, i.e. a slowdown of 18 times, which is much more than I expected.

 Of course, this is a very artificial benchmark as it does it so many times
and for the number of years up to 200, doing the same 40000000 calls to it
but with the years only up to 100 bring the time of the slow version down
to 2.7s, and while even this is still 9 times slower, it almost certainly
doesn't matter inside lmi where other things should dominate it. But I
won't be sure of this before I can update lmi and re-profile it.

GC> BTW, we currently have a header that only defines
GC> 
GC> enum enum_output_mode
GC>     {e_output_normal
GC>     ,e_output_measure_only
GC> 
GC> and I'd like to move that into 'oecumenic_enumerations.hpp', where
GC> a number of similar enums already reside.

 I actually hesitated to do this, but decided that this enum wasn't
universal enough to merit its place in this header.

GC> In doing so, I'd like to
GC> rename everything so that whole entity is self-documenting--e.g.:
GC> 
GC> enum oenum_render_or_measure
GC>     {oe_render
GC>     ,oe_measure
GC>     };
GC> 
GC> The android documentation speaks of screen rendering as
GC>   "a two pass process: a measure pass and a layout pass"
GC> although they also have a later "draw" step; so I figure that
GC> the meanings of "render" and "measure" in the PDF or HTML context
GC> are unmistakably clear. Any objection?

 No, although I slightly regret the loss of "only" as it underscored that
rendering also does measuring anyhow. But it's almost certainly clear
enough even without it.

GC> > Another possible
GC> > advantage of the current approach is that get_extra_pages_needed() result
GC> > is used as a check for render() and that if (exactly) one of them is 
buggy,
GC> > chances are that we notice it due to the warning which was triggered. Does
GC> > this justify keeping both functions?
GC> 
GC> Brooks [_The Mythical Man Month_, page 64 of the Anniversary Edition] said:
GC>   "Never go to sea with two chronometers; take one or three."

 OT3H there is a well-known take of NASA which sent 2 computers in space,
for redundancy, and a simple comparator which was used to check that both
of them produced the same results. As you can guess, the mission failed
when the comparator broke down.

GC> but that doesn't mean that having two, we should throw one overboard.

 Indeed, if this whole sad story taught me anything, it's that having 2
functions, at least initially, was useful because it allowed us to detect
the problem more easily.

GC> Clearly we're dealing with a hard problem here,

 The problem is embarrassingly simple, but there is just enough
comparisons/divisions in it to make it easy to make a mistake in one of
them... It doesn't make it hard, but it does make it error-prone, and even
if I'm reasonably certain that I did do it correctly, finally, after
writing my test and verifying that it passes for all values, it might still
be worth to keep both functions to ensure it doesn't happen again.

 The main negative of doing it I see is that it will make modifying this
code more difficult in the future, as it will need to be done in 2 (albeit
neighboring) places. However I don't believe this code would need to be
modified often, so it's probably not a big deal.


GC> Usually I try to abbreviate quoted text so that it's not too much longer
GC> than the answer, but that won't work here, so the answer is "Yes":
GC>   {1..35}, {36..70}, {71..74} wastes a page, objectionably
GC>   {1..35}, {36..74} is a tasteful tradeoff of uniformity for brevity
GC> The latter is preferable.

 I've pushed another commit to https://github.com/vadz/lmi/pull/63 to
change this. I would have preferred to make a separate PR for it, but it
would conflict with a previous change, so I've added the new commit,
https://github.com/vadz/lmi/pull/63/commits/fef8e53065d04269663c720529bb6a2e3f2f84d2
to the same PR.

 BTW, I'm hardly proud of the block of code starting at
https://github.com/vadz/lmi/pull/63/commits/fef8e53065d04269663c720529bb6a2e3f2f84d2#diff-40c87898a53505d8d9f64290acee8758R1785
but I didn't find any simpler way of doing this. I must be missing
something, but at this stage I prefer to be as explicit and as certain of
the code being correct as possible, even if it makes it uglier and longer.

 Please let me know if you see any remaining problems and sorry again for
the entire mess,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]