pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pdf-devel] Problems in the stream implementation


From: jemarch
Subject: Re: [pdf-devel] Problems in the stream implementation
Date: Thu, 02 Oct 2008 23:24:58 +0200
User-agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

Hi.

   While writing the LZW filter implementation some doubts arose (I'm
   a corner case maniac, sorry :)):

It is good to have a corner case maniac in the group, and the
identification of this problem is a good proof :)

   I have just crawled the code a little bit to get a solution without
   spamming the list and have found a very bad bug: a filter can note
   generate more data than it takes!

I have installed a patch in the trunk fixing this issue and some other
bugs in the stream module code.

The fix involved some changes in the design, and thus the interface
with the filter implementations has changed.

The prototype of the functions for the filter implementations are
identical to the previous ones:

  pdf_status_t pdf_stm_f_X_init (pdf_hash_t params, void **state);
  pdf_status_t pdf_stm_f_X_apply (pdf_hash_t params, void *state, pdf_bool_t 
finish_p);
  pdf_status_t pdf_stm_f_X_dealloc_state (void *state);

The 'apply' function now can return:

   PDF_ENINPUT 

       The filter implementation has processed all the input buffer
       and is ready to receive more input when available, via a new
       call to the 'apply' function. 

       It is assumed that the input buffer is empty after the apply
       function returns this value.

   PDF_ENOUTPUT

       The filter implementation needs more room in the output
       buffer, and is ready to fill it when it becomes available, via
       a new call to the 'apply' function.

       It is assumed that the output buffer is full after the apply
       function returns this value.

   PDF_ERROR

       Error in the data processed by the filter. If the filter
       implementation returns this value then the 'apply' function
       wont be called again without a previous call to 'init'.

   PDF_EEOF

       This value should be returned in the following situations:

         - There has been an EOF condition produced by some
           characteristics in the input data. Only the filters
           interpreting EOD markers or working in fixed-size blocks of
           data will return PDF_EEOF due to this reason.

         - The filter implementation has emitted data due to a
           finalization request (via the finish_p) parameter. Only the
           filters supporting finalization data will return PDF_EEOF
           due to this reason.

Regarding the finalization of the streams, there is a change in the
stream API: instead of use an explicit 'pdf_stm_finish' function now
there is a new parameter to the 'pdf_stm_flush' function:

    pdf_size_t pdf_stm_flush (pdf_stm_t stm, pdf_bool_t finish_p)

If FINISH_P is PDF_TRUE then the filter chain will be flushed with
finalization.

The design limitations of the filters finalization have been fixed, at
least to some extent. Having a filter chain like:

  
        F4 <--- F3 <--- F2 <--- F1 <--- F0

after calling 'pdf_stm_flush' with finish_p = PDF_TRUE the chain is
flushed. When the application of some filter FN-1 returns EOF then the
apply function of the filter implementation of FN is called with
finish_p == PDF_TRUE. 

When finish_p == PDF_TRUE the input buffer of the called filter
implementation is empty, and any data generated by the FN-1, FN-2,
..., FN-0, including the EOD markers, has been _already
processed_. The filter implementation should return PDF_EEOF after the
generation of the EOD marker.

A limitation remains: when finish_p is PDF_TRUE the filter can assume
that there is room in the output buffer to hold its EOD
marker. Currently the minimum size is 1 since the EOD marker of the
ASCII Hex encoder is '>'.
 
You can look at the ASCII Hex filters as an example on how to use the
new semantics.

PS: I have changed the cache size of the stream used in the pdf-filter
    utility to 1, to ease the catch of possible problems. Of course
    after the ASCII 85 implementation we will need to increase the
    cache size to 2.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]