pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[pdf-devel] Re: [PATCH] Tokeniser API documentation


From: Michael Gold
Subject: [pdf-devel] Re: [PATCH] Tokeniser API documentation
Date: Tue, 19 May 2009 11:38:37 -0400
User-agent: Mutt/1.5.18 (2008-05-17)

On Tue, May 19, 2009 at 12:23:20 +0200, address@hidden wrote:
>    address@hidden PDF_TOKEN_READABLE_STRINGS
>    +Encode strings in a human-readable way
>    +(i.e., in hexadecimal or with special characters escaped).
> 
> What are the semantics of this writing flag? Given that any string
> contents can be expressed in hexadecimal, when will it use the
> escaping of special characters? 

The idea is to leave the details up to the library, but produce
something that a person can read while debugging (otherwise, we'd only
need to escape backslash, carriage return, and unbalanced parentheses).
Characters 0-9, 11-31, and 127 (at least) would be escaped, and the
library might write the string as hex if most characters would be
escaped.

I'm not sure how useful that is, or whether there's a better way to
express encoding preferences; but if somebody has unusually specific
requirements, they can use pdf_stm_write instead of pdf_token_write.

>    address@hidden pdf_status_t pdf_token_keyword_new (const pdf_char_t 
> address@hidden, pdf_size_t @var{size}, pdf_token_t address@hidden)
...
> Maybe would be good to add error checking on the contents of the
> keyword. For example, it cannot start with /. To return PDF_EBADDATA
> in that case may be appropriate. What do you think? The same would
> apply to the other token_*_new functions.

Sometimes the validity of a token will depend on the flags used when
writing it (e.g., a name can include any non-null character, unless
_NO_NAME_ESCAPES is specified when writing it).  Also, these checks will
be redundant when a constructor is used by the token reader; but they're
probably not a significant performance hit, so I'm not opposed to doing
them.

In particular, constructors could check for
  - delimiter or whitespace characters in a keyword
  - null characters in a name
  - line breaks in a comment
And I'm already planning to check for
  - NaN/infinity values for real tokens
  - invalid token types for pdf_token_valueless_new

(If the above things weren't checked by the constructors, they'd be
checked by the token writer, along with whatever checks are relevant
given the flags.)

I'm not currently planning to check implementation limits in the
constructors or token writer, since I don't expect the writer to have
such limits.  We could easily check them, but it would just be for the
benefit of readers.

The other things you commented on will be fixed in the next patch.

-- Michael

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]