pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pdf-devel] Re: Modifications on pdf_token_read to get token boundar


From: Michael Gold
Subject: Re: [pdf-devel] Re: Modifications on pdf_token_read to get token boundaries
Date: Tue, 26 May 2009 00:11:06 -0400
User-agent: Mutt/1.5.18 (2008-05-17)

On Tue, May 26, 2009 at 03:18:02 +0200, David Vazquez wrote:
> 
> Michael Gold <address@hidden> writes:
> 
> > On Tue, May 26, 2009 at 00:53:46 +0200, address@hidden wrote:
> >> We need to be able to determine the boundaries of a token that has
> >> been read, for error reporting. We cannot rely on the stm used by the
> >> token reader to determine the beginning position of a read token,
> >> since it is skipping white characters.
> >
> > This behaviour could be changed by
> >  - adding a flag that causes token_read to return whitespace as a token;
> >    or,
> >  - adding a function/flag to advance to the beginning of the next token
> >
> 
> It is preferable it is not work of the tokeniser module, I think.
> 
> Adding a new function for this seems nice for me. Although we could
> simply the API if `pdf_token_reader_new' function consumes characters
> until the first token, and `pdf_token_read' does same thing after of
> read each token, then we could assume the token always is at the
> current position of the stream, therefore we could use the
> `pdf_stm_tell' function to get the beginning of each token.

Skipping whitespace in the constructor would complicate things, since
the constructor (and presumably pdf_token_reader_reset) would then be
able to return the same error codes as pdf_token_read; and if an error
like PDF_EAGAIN occurred after reading some data, there would be no sane
way to handle it since the constructor doesn't return an object on
failure.

Similarly, skipping whitespace after a token would be awkward because
pdf_read_token would have to return a failure code (and hang onto the
finished token) if it got an error while reading the whitespace.

> Finally, I think we will not need the ending offset of a token.

I don't see why it would be needed, but it's trivial to determine since
the tokeniser never reads past the end of a token (except for peeking
one byte ahead when necessary).

-- Michael

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]