lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Testable alternative to compressed PDFs [Was: Please review commit


From: Greg Chicares
Subject: [lmi] Testable alternative to compressed PDFs [Was: Please review commit a929271]
Date: Sat, 10 Feb 2018 20:41:05 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 2018-02-07 22:58, Vadim Zeitlin wrote:
> On Wed, 7 Feb 2018 22:37:28 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> > [...] Kim told me that some other vendor offers an option
> GC> > to generate text files for regression testing, as an alternative to PDF
> GC> > files for actual production. We don't know exactly how they do this,
> GC> > but perhaps there's a way to make our new PDF code emit flat text as an
> GC> > optional side effect. That would enable automated regression testing of
> GC> > a PDF file's content only, not its physical layout; but it would still
> GC> > be very useful.

[...placing the options in a numbered list for future reference...]

>  I can certainly implement this in a relatively straightforward at our code
> level (although it would still take some time).

1. Generate text output as an optional adjunct to PDFs.

This is a brutally direct approach. It takes more work, but would
be extremely flexible and robust.

> However I should also
> mention that it's relatively straightforward to extract the text from the
> PDF files that we already generate.

2. Extract text from PDFs.

This takes less development work, but sounds like an inconvenient
extra step, which would depend on adding yet another tool to our
toolkit (which takes effort for all of us to learn, and puts us
at the mercy of another third party). I think Kim already has some
non-free tool that does this, and hopes for a better solution.

> And, in fact, uncompressed PDFs (and
> it's trivial to either disable compression when generating them with
> wxPdfDocument or decompress them afterwards) are also surprisingly
> readable.

3. Write uncompressed PDFs.

I didn't even realize this was possible. Can you show me how to
do it, so that I can experiment with it?

>  So, depending on what exactly do we want to do, outputting text might not
> be the best solution. Before starting to do it, it would, IMHO, be better
> to clearly understand what are we going to do with the generated text
> files. Would you know the answer to this already and, if so, could you
> please explain it to me?

It's hard to say much more than I did above...

> GC> > [...] enable automated regression testing of
> GC> > a PDF file's content only, not its physical layout

...without actual examples of text output. Perhaps we'd be able to
use them the same way we now use '.test' files for system testing.
You probably haven't worked with those, but you should be able to
create one with this command, adapted from 'gwc/develop1.txt':

/opt/lmi/bin[0]$wine ./lmi_cli_shared --file=/opt/lmi/src/lmi/sample.ill 
--accept --ash_nazg --data_path=/opt/lmi/data --emit=emit_test_data           
/opt/lmi/bin[0]$ls -o /opt/lmi/src/lmi/sample.test
-rw-r--r-- 1 greg 75840 Feb 10 18:47 /opt/lmi/src/lmi/sample.test

Roughly speaking, it shows all the data in every available column,
one datum per line, and all the scalar data. For system testing,
we generate about 1200 of these files, and compare each one to a
touchstone (saved, in a separate directory, from a previous run)
using tools like diff.

But don't focus on the particulars of that layout. We might come
up with something much more like 'cgi.touchstone'. What's important
is what that file has in common with '.test' files: it's flat text
that's easily compared with the usual *nix command-line tools.

We can even write our own custom tools, like 'ihs_crc_comp', whose
output is postprocessed in 'workhorse.make' with sed
          -e '/rel err.*e-0*1[5-9]/d' \
          -e '/abs.*0\.00.*rel/d' \
          -e '/abs diff: 0 /d'
to filter out "uninteresting" numeric discrepancies. That example
shows, I hope, why we can't easily foresee exactly what sort of
flat-text output we want: the test output may evolve along with
the way we use it.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]