pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re : [pdf-devel] Tokeniser Module - Unit Test: test cases and sugges


From: Michael Gold
Subject: Re: Re : [pdf-devel] Tokeniser Module - Unit Test: test cases and suggestions
Date: Wed, 21 Oct 2009 20:10:28 -0400
User-agent: Mutt/1.5.20 (2009-06-14)

On Wed, Oct 21, 2009 at 14:04:38 -0700, Pierre Filot wrote:
> >> test 4: "\LF" "\CR""\CR+LF" are not considered part of the string (left to
> >> be tested "\LF" "\CR") 
...
> I meant here a backslash followed by an EOL:

OK, that makes sense.  For tests involving EOL markers, you could also
see what happens with two consecutive ones (e.g. "CR LF CR LF", "LF CR",
"CR CR", etc.).

> >Extra opening parentheses can't really be detected (except when we reach
> >EOF with an open string).
> >this shouldn't occur outside the STRING and COMMENT states
> 
> Well I was thinking of unbalanced parentheses within a literal string.
> The
> thing is that a writer would need to check whether it deals with
> balanced parentheses that don't need to be escaped or unbalanced ones
> that need a backslash. ((A)) since it is balanced parentheses doesn't
> need to be escaped. But instead (A)) would need to be written as (A\)).

A test for the writer would be useful.  You'd need to verify that the
resulting string is valid, but it doesn't matter which method (balancing
or escaping) is used for encoding parentheses -- this isn't specified as
part of the API.

> Another option is to escape every parenthesis within a literal string
> when writing it to a stream. But I don't know if that is against the
> specification :
> "balanced pairs of parentheses within a string require no special treatment." 
>  

That statement doesn't forbid special treatment, so escaping all
parentheses would be valid.

> >Comments and strings can contain embedded null characters, so software
> >must not treat them as null terminated.  Maybe we should append a
> >non-null character to make such bugs obvious.
> 
> Sorry but I dont see how come that appending a non-null character
> helps to detect such a bug ?

The latest version of pdf_token_buffer_new puts an 'X' after every
buffer that's not null terminated.

The idea is that code treating the value as a C string, like
  printf("%s\n", pdf_token_get_string_data(token));
will produce a value that's clearly incorrect.  If we didn't do this,
sometimes a null byte would follow the buffer by chance, and the code
would appear to work.

> Also, I just spotted that there is no PDF_EBADDATA as return value of
> pdf_token_reader_new as stated in the manual.

The documentation states
  PDF_EBADDATA: stm is not a reading stream.
but I don't see any API to get the mode of a stream.

Does anyone have a problem with a new public function
  enum pdf_stm_mode_e pdf_stm_get_mode (pdf_stm_t stm);  ?

-- Michael

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]