Re: [pdf-devel] goals and motivations

pdf-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [pdf-devel] goals and motivations

From:	jemarch
Subject:	Re: [pdf-devel] goals and motivations
Date:	Wed, 01 Aug 2007 17:35:13 +0200
User-agent:	Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/22.1.50 (powerpc-unknown-linux-gnu) MULE/5.0 (SAKAKI)
Hello.

   Dave Crossland pointed out the GNU PDF project to me today. It's great 
   to see more Free Software work in the PDF space!

It is really a privilege to see you there. Many thanks for your work
in ghostscript. I admire you, really! :)

   I'm a little puzzled about what you're writing though. 

Sorry for the confusion. We have been talking about this issue for a
long time before to launch the project. Not all the information is
written down in the webpage. We shall fix that as early as possible.

   The "Goals and motivations" page has lots of information about why
   you'd want a PDF implementation, but nothing about why you're
   writing a new one, or what your goals are. There's already
   Ghostscript (GPLv2) and libpoppler/xpdf (GPLv2+) on the rendering
   side, and libcairo (LGPL2.1/MPL1.1) on the generation side.

The main goal is to provide complete and high quality software to
manage PDF content. For all the reasons exposed in the
GoalsAndMotivations webpage, we really _need_ free access to the PDF
technology. We plan to appoint the GNU PDF project as a FSF high
priority project. We are also working to get funds from governments in
order to pay developers. Under of our point of view, it is a quite
urgent issue.

As you point out, there is existing software that may provide
something like that, but we decided to start a new project from
scratch due to some requirements we have in mind:

- completeness
- portability
- efficiency
- robustness regarding legal issues

We did a deep research to the existing free software packages
implementing the PDF file format. For one reason or another, we
decided to no reuse those programs.

Let me quickly explain what our reasons are.

Ghostscript is a marvelous piece of software. Its coverage of the
postscript specification is really good, and its capacity to run even
in a toast machine is impressive. But as you know (surely better than
any other human being :)) the ghostscript codebase is also huge and
quite complex. Note that I am not saying it is too complex for the
tasks it implement: as Peter Deusch says, to use the gs allocators
with GC in the C level is not a happy thing, but we dont know a better
way to do it. I agree with that. Ghostscript is complex just because
it implement complex things. And i consider that complexity level is
very well managed in ghostscript. Again, you are one of my
hacker-heros :)

But the complexity associated with postscript interpretation is not
needed for PDF interpretation. We prefer to work in a lightweight PDF
interpreter. As long as i know (i may be wrong) similar reasons led
the ghostscript people to launch GhostPDF and the MuPDF+Fitz
prototype.

We also had in mind other minor considerations for decide not to use
ghostscript for this task. The PDF interpreter distributed with
ghostscript is written in postscript. It is not easy to find hackers
capable to (or willing to) write postscript. Also, it is difficult for
other applications to interact with ghostscript in order to, for
example, extract information from a PDF file. The libextractor
maintainer decided to use poppler for this reason (and still he is not
very happy, since poppler has some difficulties that I will address
later). The GNU PDF library should provide GNU (and free software in
general) software a convenient access to the Adobe technologies
regarding PDF.

We also considered to use xpdf or the poppler library. Almost all free
software viewers supporting PDF are using that library, after all. It
works and is actively maintained. But we found enough arguments to not
use it. First of all, there is the portability issue. poppler is
written in C++ and extensively uses the standard template library. If
it is difficult to write portable C code, to use C++ is to call for
portability problems. Someone may want to embed the gnu library in an
embedded device, for example. There is another reason against to use
C++ for the library: the vast majority of the GNU system is written in
C, and one of the goals of the library is to provide convenient PDF
support to other GNU packages.

We are using a bored but we hope effective method to achieve
completeness: to design and implement in "width" rather than in
"deep". Before to think to implement the lexer or the parser, for
example, we want to have support for all the filters (even the rarely
used ones, and including the encryption ones), all the structured PDF
objects (including the PDF functions, all its types), etc. We dont
want to pass to the next chapter of the specification (in a figured
way, you know the PDF spec is not exacly linear) until we have
complete support of the previous ones. Under this point of view, the
objective of the GNU PDF project may differ enough to the objective of
ghostscript and poppler to considerate a new implementation. As we see
it, the ghostscript goals are more oriented to good postscript support
rather than to good PDF support (it is a bigger and difficult
objective!). In a similar way, the objective of the poppler project
seems to be more visualization-oriented than to provide good
interfaces for PDF editing.

Finally, we also directed our attention to MuPDF+Fitz. Again, we
detected some degree of divergence in objectives: the author of both
mupdf and fitz seems to be more interested in a superb graphics
library implementation (Fitz) rather than its interface with PDF
(MuPDF). It is a wonderful task and i think he is doing a very good
work in adapting Fitz to support several distinct imaging models (such
as the Metro support). I would not be able to do such a good work.

I hope i succedeed in explain our points of view. My english is far
from being perfect and i dont want to raise any missunderstanding :)

   There is a need for a low level pdf object and stream manipulation 
   library. Your roadmap doesn't mention a graphics library, so perhaps 
   that's your intent, but then why do you talk about image-only filters 
   like JBIG2 and JPX? And on the front page you mention implementing 
   support for PDF 1.7 and some other rather large specs.

Indeed, we plan to support the entire PDF 1.7 specification and maybe
the XMP one. The roadmap is far from being complete. 

Please note that we may not need to write everything from scratch. We
are considering to use an existing graphics library capable to manage
the Adobe imaging model (Fitz may be a very good option, i think). The
ghostscript implementation of type1 and truetype fonts is also quite
attractive! ;)

   I also fear GPL will be an unpopular license for such a library.

Well, to be sincere, i also have my own fears about this. But right
now it is a GNU policy and we agree with it. The rationale behind the
use of GPLv3 is well explained in
http://www.gnu.org/licenses/rms-why-gplv3.html

I dont have better arguments than rms does! (karl berry dixit) :)

   Anyway, I'm just curious about the larger plan was.

Again, many many thanks for your interest. I am somewhat scared about
this project. I stepped down as the maintainer for other gnu packages
to be able to dedicate my entire hacking time to it.

-- 
Jose E. Marchesi  <address@hidden>
                  <address@hidden>

GNU Spain         http://es.gnu.org
GNU Project       http://www.gnu.org
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [pdf-devel] goals and motivations, jemarch <=
Next by Date: [pdf-devel] ¿Novedades?
Next by thread: [pdf-devel] ¿Novedades?
Index(es):
- Date
- Thread