[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[libextractor] Microsoft Office mimetype (OLE2) is not recognized reliab
From: |
Marc |
Subject: |
[libextractor] Microsoft Office mimetype (OLE2) is not recognized reliable |
Date: |
Sat, 30 Aug 2008 23:44:15 +0200 |
User-agent: |
KMail/1.9.9 |
Hi,
great work the libextractor, I like to learn with it and figure things out,
starting to learn python.
One problem I noticed:
I try to distinguish file formats of the different Microsoft-Office
formats using the mimetype information provided by libextractor (I have no
filename extansions of the files to investigate). The problem is that often
only a general information e.g. "application/vnd.ms-office" are extracted.
The result depends on the specific application which has been used at last
save of the document/spreadsheet/presentation.
I found out that other programms have similar problems to do this job:
- In the Linux-Distro Kubuntu Hardy that I use - e.g. XLS-files without
filename extension appears as DOC in Konqueror
- Windows XP can't do so either (in filemanager)
- I also tried NLNZ Metadata Extractor v3.0 without success
- The file command on the shell gives wrong application type too
Although e.g. OpenOffice can open all the formats without filename extension
and imports the correct way (Writer/Calc/Presenter).
I use use libextractor 0.5.18a and Python-Extractor 0.5-2. In ChangeLog I
didn't found changes regarding OLE2 plugin since 0.5.18a version.
Anyone has encountered the same problem? How could this be solved?
Best regards,
Marc
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [libextractor] Microsoft Office mimetype (OLE2) is not recognized reliable,
Marc <=