libextractor
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [libextractor] libextractor-0.5.17 (win32) - extract.exe stops with


From: Christian Grothoff
Subject: Re: [libextractor] libextractor-0.5.17 (win32) - extract.exe stops with error msg
Date: Sun, 22 Apr 2007 15:03:43 -0600
User-agent: KMail/1.9.5

On Sunday 22 April 2007 14:12, Nils Durner wrote:
> > Neither Qt nor GTK are reasonable for ReactOS source tree, I will
> > probably write my own lib which generates thumbnails using freeimage
> > lib API functions.
>
> If it works with broken files (doesn't crash), this might be something
> we're interested in.

Also, it needs to be reentrant and not print arbitrary error messages to the 
console for broken files.  I tried a bunch of libraries and these were the 
main problem that most of the others had.  What would be great if, in 
addition to scaling down, the library also supports reducing the color depth 
(that's currently not supported by the existing code, but I consider it a 
missing feature).

> > In the LE mailing list archive, I have read that LE don't come with
> > extended inbuild pdf support anymore, due security issues related to
> > xpdf.
> > Are the xpdf issues still valid? pdftotext (poppler) and pdfinfo
> > (xpdf) would be really handy.

The issues are still there -- mostly because the xpdf code is a big mess (at 
least it was last time I looked at it) and was not written with security / 
input validation as the primary concern.  That should not mean that xpdf was 
not a major contribution -- I'm using xpdf frequently to view trusted pdf 
files and I like it much better than any other PDF viewer out there -- I just 
don't think the code is suitable to be the default PDF plugin for LE.  Among 
other things, the very nature of xpdf implies that the code must do much more 
than what LE requires, which makes it both slower and more complex.  

Note that other plugins (RPM in particular) have similar issues.  The question 
is between spending more time patching things up and fixing things for real.  
Naturally, to some extent this can be decided by whoever puts in the work, 
I'll be happy to accept any patches (including those that update the 
xpdf-based plugin) as long as it is clear that they improve things overall.

> > Version 3.02 (2007-02-27) of xpdf has fixed several security holes and
> > it does now support PDF 1.6 and PDF 1.7 :-)

Fixes are great, but the problem is that quite a few people (check Debian 
security, for example) consider the entire codebase to be of such bad quality 
that such fixes are not likely to be the silver bullet to security here.

> > There have been a lot of changes and improvements since v. 3.01
> > (2005-08-17), so it might be a good idea (if not already done) to
> > review the latest version.

The other problem is that xpdf is just too big.  We always cut down huge 
portions of the code since for LE we only need a tiny fraction anyway. There 
is an unfinished PDF extractor in SVN that parses pdf (1.4).  My goal is to 
eventually have fast, special PDF extractor code in LE that is both simple 
and secure and only does what we need (and of course support PDF 1.7).  
However, I've been focusing on other things lately, and this would be a major 
effort.  So help would be very welcome.

> > With current LE's pdf lib (based on PDFDoc), I can only extract a
> > handful of metadata but not text-content and other metadata.
> >
> > e.g.
> > creation date - 20051005173339+02'00'
> > producer - OpenOffice.org 1.9.79
> > format - PDF 1,0
> > mimetype - application/pdf
> >
> > ... btw. the pdf format version is 1.4 (and not 1.0) in that example
> > case, as Adobe Acrobat 8 told me. I have used "extract.exe -ad 2.pdf >
> > 2.pdf.txt" to extract the metadata using libextractor-0.5.18.zip
> > (win32 binary official package).
> >
> > Currently, pdf support is a minor priority for me; although in about
> > one year, it might get important feature, as both my code and ReactOS
> > itself get more matured.

Same here.  :-).

Christian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]