pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[pdf-devel] Re: Initial API for the tokeniser module


From: Michael Gold
Subject: [pdf-devel] Re: Initial API for the tokeniser module
Date: Fri, 15 May 2009 17:09:57 -0400
User-agent: Mutt/1.5.18 (2008-05-17)

On Fri, May 15, 2009 at 14:17:59 +0200, address@hidden wrote:
...
>    >    In normal operation this should be decided automatically, maybe
>    >    based on a policy configured for the stream writer (e.g., "avoid
>    >    8-bit characters").
>    > 
>    > Since there are several ways to get the printed representation of a
>    > token, 
> 
>    I guess you meant to write more here.
> 
> Heh, yes :)
> 
> I wanted to say that since there are several ways to get the printed
> representation of a token, we should provide a fine grained control to
> the client in this level. Upper layers of the library will take care,
> for example, about PDF x.y compatibility.

So the token writer should output a normal string when no special flags
are given to pdf_token_write, and let an upper layer give a flag when it
wants a hex string?

> The same applies to the interpretation of the textual forms for
> tokens.

Are you referring just to '#' escapes here?  I think that's the only
case where a token can be interpreted in two different ways.

...
>    For now there's no implementation that can return EAGAIN since there's
>    no way to set a non-blocking mode.  At some point a function to do that
>    should be added.
> 
> Maybe would be good to write a little read-only http filesystem and
> incorporate it to the library.  That would allow us to test the
> non-blocking capabilities of pdf_fsys and also the read-in-advance
> functions.  It would also work as an example of non-disk based
> filesystems for people interested in writing their own fsys
> implementations.

That's a good idea.  Whoever implements this should look into using a
library like libcurl to handle the protocol (and possibly give us other
protocols for free).

...
>    Also, I had two flags for the token reader:
>      _RET_COMMENTS  (return comments as tokens)
>      _PDF11         (don't treat '#' as an escape character in names)
>    I didn't have a public function to set them, but one would need to be
>    added.  How about
>      void pdf_token_reader_set_flags(int flags),
>    where 'flags' is a bitmask?  We may also need a _PDF11 flag for the
>    writer.
> 
> I would use _SHARP_ESCAPE instead of _PDF11. Upper layers will care
> about PDFx.y portability.

I think something like NO_SHARP_ESCAPE/NO_NAME_ESCAPE would be better
(i.e. the default would be to interpret the escapes, so no flags would
be needed in the common case).  Also, the flags should probably be
specified for token_read rather than a separate reader_set_flags call,
for consistency with token_write.

> Also, I think that it would be good to use the PDF_ prefix for these
> flags.

Just "PDF_", or something like "PDF_TOKENISER_"?

-- Michael

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]