pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pdf-devel] Updated tokeniser/parser patch


From: Michael Gold
Subject: Re: [pdf-devel] Updated tokeniser/parser patch
Date: Thu, 22 Jan 2009 19:31:05 -0500
User-agent: Mutt/1.5.18 (2008-05-17)

On Fri, Jan 23, 2009 at 00:11:43 +0100, address@hidden wrote:
>    >     can we use
>    >     the same data type (pdf_obj_t) for both the public interface and
>    >     to be used by the parser?, etc.
> 
>    It's obvious to me that we should, since many of the type (like strings)
>    would end up being identical. One thing I wanted to discuss, though, was
>    whether it's also OK to store tokens in pdf_obj_t.
> 
> It is not that obvious. The public interface should include a document
> where to create the object in (see the pdf_obj_stream_new preliminary
> declaration above) in order for it to register the object.
> 
> But, do we want to let the parser know about a pdf_obj_doc_t? I don't
> think so (but I am neither sure about the contrary). Likely we will
> need a different interface.

Right now the parser doesn't deal with documents at all, and probably
shouldn't (the reader should handle that). Maybe we'll want a separate
object that links a document and pdf_obj_t, but we can discuss it later.

> It is a pity that you dedicated all that time working on that
> obsolete pdf_obj* code: quite probably we wont use it at all.

For the basic type, something like pdf_obj_t is needed by the tokeniser
anyway even if it's never exported publically. As for the container
types, that was basically just an experiment to see what type of
interface would be reasonable, which is why I left it unfinished.

> But it is also my fault to keep that old obsolete code in the
> repository. Sorry about that.

That's no problem. I realize some of that may be thrown away, but it
could still be useful for ideas.

>    We wouldn't necessarily lose all the benefits of opaque pointers. Only
>    the size of the structure needs to be public in order to allocate it on
>    the stack. The contents don't need to be documented, and we could leave
>    some padding for future use as recommended in
>      http://people.redhat.com/drepper/goodpractice.pdf
> 
> Good point! How that would be done for our public data types
> implemented as structures? Can you provide an example?

The paper gives this example:
  struct the_struct
  {
    int foo;
    // ...and more fields
    uintptr_t filler[8];
  };
But we could make the whole thing filler and avoid declaring any fields.
We'd then need to cast it to the proper internal type when using it (we
should verify sizeof(public_struct) >= sizeof(actual_struct) as a sanity
check).

BTW, another paper some people may find useful is
  http://people.redhat.com/drepper/dsohowto.pdf
Maybe we should link those papers from the web site or hacker's guide.

>    > Yes, we noticed that. Would be really nice to modify the list module
>    > to not crash if 'xalloc_die' returns NULL (does nothing). Really, I
>    > think that it is the only way to use the list module in a library.
>    > 
>    > I noticed that you are active in the gnulib development mailing
>    > list. Would you want to raise the issue there?
> 
>    Yes, I was planning to raise it once I update c-strtod.
> 
> Superb! I don't know how many gnulib modules are relying on a
> "killing" xalloc_die (I suspect that some of them) but would be really
> good to have that fixed.

Based on a quick grep: mkdir, basename, dirname, and the list modules.

xalloc_die would still be used in some cases, like when someone tries to
put more than INT_MAX objects in a container -- that's insane, so I'm
not worried about cases like that.

> These are mainly lexical issues... maybe we could think about moving
> the lexer module to the base layer. In that way the fp module could
> use it in the little type 4 functions parser.
> 
> What do you (and people) think about that? If you agree I will open
> some NEXT tasks to create the new module (and its tests, etc) and will
> mark it as a dependency for the error reporting in type 4 functions.

I don't have a problem with it, but it will need something like
pdf_obj_t to store these types:
  int, real, string, name, comment, keyword
as well as the valueless types corresponding to
  "{", "}", "<<", ">>", "[", "]"

And the parser will want to put these objects inside dicts and arrays,
though it could convert them or wrap them if necessary.

-- Michael

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]