lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Control flow in page_with_tabular_report::render()


From: Greg Chicares
Subject: Re: [lmi] Control flow in page_with_tabular_report::render()
Date: Sun, 11 Feb 2018 20:23:36 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 2018-02-11 18:31, Vadim Zeitlin wrote:
> On Sun, 11 Feb 2018 18:12:03 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> Let me explain what I'd like to do (or at least attempt). In light of:

[...abstracting the algorithm for pagination...]

> GC> the algorithm is general, widely applicable, and hard to get right.
> 
>  Yes, I understand and agree that it's a good idea. I didn't think it was
> worth extracting this logic in a separate table pagination class because I
> thought it was so simple as to be not worth it, but this was experimentally
> shown to be far from being true.

Before examining your recent work in 'miscellany.cpp', I tried writing
it myself, if only to see whether I'd come up with the same thing.
That exercise further convinced me of the difficulty.

> GC> Then I think the pseudocode could be:
> GC> 
> GC>   pages = magic_class.pages();
> GC>   for(p: pages)
> GC>     {
> GC>     print header;
> GC>     rows = magic_class.rows(p);
> GC>     for(r: rows)
> GC>       {
> GC>       print row;
> GC>       // use magic_class somehow to print blank rows as needed
> GC>       // (many ways to do that--no need to choose one now)
> GC>       }
> GC>     print footer with page number;
> GC>     }
> 
>  Yes, but it still has 2 loops and, IMHO, rewriting it as one (while
> certainly possible) wouldn't help with code clarity at all.

It loops over rows, within a loop over years, and that's natural.

Perhaps what confused me most about the render() function in its
original form was that it looked something like this (from memory):

  for(year = 0; year < max;) // Iterate across year, meaning "row"?
    for(;year < max; ++year) // Iterate across year, meaning "row"?

and I thought both loops shared the same loop counter, and thus
looped over the same range. Rewriting the inner loop as "for(;;)"
didn't really clear things up for me...

  for(year = 0; year < max;) // Iterate across year, meaning "row"?
    for(;;)                  // Iterate across "row"

because it still seems to me that, if two loops iterate over the
same set of values, they should be combined into one loop.

Looking back now, I see that this is because I was lost already
at the beginning of the outer loop: for me, it's a...
  https://en.wikipedia.org/wiki/Garden_path_sentence

   what it says                what it really means
  --------------    ------------------------------------------
  for(row: rows) // Iterate not across rows, but across pages!

The intention is more like:

// I can't write lambdas off the top of my head, so...
#define ALL_DONE((year == max))

  year = 0;
  for(;!ALL_DONE;) // For each page...
    for(whatever)  //   For each row...

> GC> I understand that render() doesn't work that way today: instead, it
> GC> waits until it's about to print something, and decides at that point
> GC> whether to print a row, print a blank line, or start a new page.
> GC> But in principle all those decisions can be made in one unit-testable
> GC> class and embodied in some data structures that it can provide.
> 
>  Yes.

What are those data structures? What information do they embody,
and in what fashion? That's what I'm trying to figure out.
Given this preliminary sketch:

/// Spread J rows in groups of K over a number of pages of length L.
///
/// For example, lay out
///   [J] 100 data rows
///   [K] in groups of five, with blank lines between groups
///   [L] on pages with 37 usable lines (net of headers and footers)
///
/// data consists of "rows"; pages consist of "lines"
///
/// Preconditions:
///  0 <= J
///  0 < K <= L

...my candidates for questions we'd like to ask are:
 - how many pages are needed?
 On a "normal" page (not the last):
 - how many groups are there?
 - how many data rows are consumed?
 - how many output lines are printed?
 On the last (which might be the only) page:
 - [same questions, I suppose, with different answers]
Should there be, say...
  one data_page_layout struct for the last page,
  one data_page_layout struct for any and all prior pages, and
  one int for the total number of pages?
Should 'data_page_layout' above be a struct, a tuple, or what?
Instead of two structs (last page, other pages), might we want
a vector<struct> so that the render() logic can just iterate
across it rather than do an "is this the last page yet"
calculation that could be coded incorrectly...especially since
the cardinality of the vector is going to be tiny?

> GC> >  To help with understanding this function code, it probably would be
> GC> > helpful to mention that each (physical) page is composed of a fixed page
> GC> > part, including the header and all the text before the beginning of the
> GC> > table, and the varying part containing the table rows. The former must 
> be
> GC> > rendered outside of the inner loop, while the latter is rendered inside 
> it.
> GC> 
> GC> Printing the top of the page, and printing the bottom, are different
> GC> operations from printing the rows in between. They're already separate
> GC> function calls. The present render() happens to handle the top outside
> GC> the loop, and the bottom inside it, but if I understand this correctly
> GC> there's nothing about the "bottom" handling that requires it to be
> GC> done inside a loop over rows: IOW, nothing that invalidates my
> GC> proposal above. In yet other words, I think the "latter" part here:
> GC> 
> GC> | The former must be rendered outside of the inner loop,
> GC> | while the latter is rendered inside it
> GC> 
> GC> is just an implementation detail: the former "must" be treated that
> GC> way, but the latter could be treated otherwise (outside the loop).
> 
>  I'm sorry, I find this very confusing because it directly contradicts the
> pseudo-code above: the "latter" here is the part containing the actual
> table rows and (I'm tempted to say "of course") it must be done in a loop
> over rows, mustn't it?

I'm sure this is just semantic confusion. We're saying "former"
and "latter", and "top" and "bottom", and you take the latter
to mean everything that isn't a header, while I take it to mean
the footer and page number.

> GC> Here's my burning question:
> GC>   class B           {virtual foo();};
> GC>   class C: public B {};
> GC>   class D: public B {virtual foo() override;};
> GC> For an object of class D, why call C::foo()?
> 
>  Generally speaking, each object must call its base class version of the
> method until it explicitly wants to replace its implementation, which is
> not the goal here.
> 
> GC> Is it that someday, maybe C will override B::foo(), and then
> GC> that new C::foo() will be called automatically without any
> GC> need to change D's implementation?
> 
>  Yes. Alternatively, a completely new class Z might be added to the
> hierarchy between B and C and its foo() will still be called, as expected.
> 
> GC> And, if so, such a change would go against this recommendation
> GC> from the 2001 boost coding guidelines:
> GC> 
> GC> | Override a virtual function at most once in any path up the
> GC> | inheritance hierarchy. No derived class should implement a
> GC> | virtual function that is also implemented in one of its
> GC> | ancestors, except in the class where that function is first
> GC> | declared. This helps to clarify designs by keeping the model
> GC> | simple: the function to be called is either the default
> GC> | implementation, or it is the (single) overrider.
> GC> 
> GC> Would you consider that advice inapplicable today?
> 
>  It's (still) a good advice, and I try to follow it when possible, but
> sometimes it's just not worth it.

What I'm trying to say is: could we please just call B::foo()
instead of C::foo(), because it's much easier for me to read
and I don't think we'll ever want to implement a C::foo()?

> GC> Does "the footer is drawn before drawing anything else" mean...
> GC> 
> GC>  - after drawing the header, and then the rows, the footer is
> GC>    drawn: after header and rows, but before any subsequent
> GC>    drawing; or
> GC> 
> GC>  - when a page is about to be drawn, the footer is drawn,
> GC>    first of all, before anything else: before even the header
> 
>  Currently it's the latter, just because the footer is drawn by the base
> class method and I prefer to always call the base class version first,
> before doing all the rest in the derived classes.

That makes perfect sense to me.

> GC> If we precalculate the number of pages, and the number of
> GC> data rows to show on each (and their spacing), then we've
> GC> split the work into two separate pieces, each of which can
> GC> be simpler than their combination. And the first can be
> GC> unit-tested exhaustively. That's my dream.
> 
>  We can indeed do it like this. Would you like me to change the code in
> this way or do you prefer to do it yourself?

I'd like to collaborate with you on a solution. I don't care if
it takes longer to do that, because it'll give the best result.

As the first step, let's try to figure out what the abstract
function-or-class in 'miscellany.cpp' should do. Would you take
a look at my list of "candidate" capabilities for it, above,
and let me know what you think?

Once that's settled, it might make sense for me to write the
abstract thing and you to write a new render(). At least, that
makes sense to me, because I've given myself the part that I'm
sure I can do, and obviously you can do the part that I can't
readily do (you saw what happened when I tried). And we can do
code review on each other's work, so that at the end everything
will be as flawless as conceivable and we'll both understand
all of it. What do you think?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]