pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[pdf-devel] hello, qpdf (http://sourceforge.net/projects/qpdf)


From: Jay Berkenbilt
Subject: [pdf-devel] hello, qpdf (http://sourceforge.net/projects/qpdf)
Date: Sun, 01 Nov 2009 16:54:12 -0500

Hello.  This is my Hello message. :-)

I've recently become aware of gnu-pdf, and so I wanted to make the
project aware of my open source PDF inspection/manipulation software,
qpdf.  qpdf is released under the terms of version 2.0 of the Artistic
license, but I would be supportive of inclusion of any of its code, or
of use of any of its code for ideas for gnu-pdf.  As qpdf is written in
C++ and certainly has a different underlying architecture, so most of
the code probably won't "drop in" in its present form, but it may still
be useful.

qpdf is a library whose focus is structural reorganization of PDF files.
It is also somewhat of a PDF hacker's toolkit, which is where most of
its usefulness may be to people who are working on gnu-pdf.  It has a
PDF structure checker and can also do a number of content-preserving
transformations.  Here is a partial list:

 * Linearization

 * Conversion to or from object streams

 * Encryption/Decryption (R=2,3,4 including AES)

One of the features of qpdf that I find most useful (or the original
reason I wrote the software) is a form that I call "QDF" form.  This is
a form intended for helping PDF experts look at and work with PDF files
in an ordinary text editor.  QDF files are fully valid PDF files that
are laid out in a particular way and have some extra comments in them
that help with reconstruction of the cross reference table (or stream)
after the file is manually edited.  When qpdf writes QDF files, it also
uncompresses all streams that it knows how to uncompress.  There's a
companion perl script called "fix-qdf" that reads a QDF file as input
and writes a new one as output with a corrected cross reference table
and, if object streams are in use, the offset tables at the beginnings
of object streams.  It also fixes all stream lengths.  This makes it
possible to generate a QDF file, hack away at the content streams or
other structures, repair the damage, and convert back to a normal PDF
file with compressed content streams, etc.  It can be a big help when
learning about PDF or when experimenting with ideas that you may want to
build into the code.

QPDF also has the ability to automatically recover from several common
forms of damage to PDF files, and when it can't recover, it gives
detailed developer-oriented error messages that can help you manually
recover broken files.  I have manually rescued many PDF files that
couldn't be opened by any PDF software, and qpdf's automatic recovery
often works better than Adobe Reader's automatic recovery, though I'm
sure there are also cases where other readers will do a better job.

QPDF has the ability to decode for the following filters: Flate, LZW,
ASCII85, ASCIIHex.  It can also encode with Flate.  The flate decode
filter is implementing using zlib.  The others are hand-coded inside
QPDF.  QPDF uses a simple pipeline system I put together for these and
also for encryption, which makes it pretty easy to work with chains of
filters.

If you're interested in more information, I encourage you to download
qpdf and look at its documentation or read comments in the public header
files.  You can also find a documentation link from qpdf's main website,
http://qpdf.sourceforge.net.

For those of you using debian GNU/Linux (or Ubuntu), qpdf is
available in the archive.  I just uploaded the newly-released 2.1 on
Friday, so that version can only be found in debian unstable.  The older
2.0.6 release is in debian Lenny and in the current Ubuntu release.  It
doesn't support R=4 encryption and its recovery features are not quite
as good, but it has the other features I mentioned.

Unfortunately, I doubt that I will have much time available to
contribute code to the GNU PDF project, at least for now, but as someone
who is pretty familiar with the PDF specification (at least at the
structural level), I may lurk on the list and chime in when I feel that
I have something to offer.  Again, I encourage you to make use of qpdf
in whatever way you can, whether by taking code or ideas from it, or
whether by just using it as a tool to help you look at and experiment
with PDF files.

Whether you use qpdf or not, I wish everyone the best of success on the
GNU PDF project.  I think it's an important contribution to the overall
suite of available tools.

-- 
Jay Berkenbilt <address@hidden>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]