help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: viewing docx files


From: Jude DaShiell
Subject: Re: viewing docx files
Date: Mon, 30 Jan 2017 02:21:39 -0500 (EST)
User-agent: Alpine 2.20 (NEB 67 2015-01-07)

I wonder if the file utility can tell the difference between a docx-utf-8 file and a docx-non-utf-8 file. If that can work it may be possible to do a little docx inspection to find when to trigger the unzip->iconv->zip process and only trigger that process when necessary. On Sun, 29 Jan 2017, Joost Kremers wrote:

Date: Sun, 29 Jan 2017 17:51:00
From: Joost Kremers <joostkremers@fastmail.fm>
To: Tomas Nordin <tomasn@posteo.net>
Cc: Devin Prater <r.d.t.prater@gmail.com>, help-gnu-emacs@gnu.org
Subject: Re: viewing docx files


On Sun, Jan 29 2017, Tomas Nordin wrote:
Devin Prater <r.d.t.prater@gmail.com> writes:

Hi all. I'm running Gnu-Emacs (latest brew install emacs version) on MacOS Sierra. I run Emacs in the terminal, and use the Emacspeak package for access, since I am blind. I received an email (gnews), with an attachment, two docx files for reading. I was able to download the attachments to my ~/ directory. I opened the file (c-x c-f then tab completion), but it opened

I wonder if you would like to eval and try this:

(defun docx2html (file)
  "Convert FILE to html in a buffer and display it."
  (interactive "f")
  (let ((html-buffer (format "*%s --> html*" file)))
    (call-process "pandoc" file html-buffer nil "--to=html")
    (switch-to-buffer html-buffer))
  )

After evaluation, say M-x docx2html and locate the docx file. See if it
works. It did not work for me but it seems to have to do with the
encoding of the characters in the test files I have. I mean, it works
such that I get the following message from pandoc in the new buffer:

pandoc: Cannot decode byte '\xb1': Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

Pandoc only reads and writes UTF-8 and does no conversion. So if the files you want to convert & view are in another encoding, you'll need to reencode them first. Not sure if there's a tool to do that for docx files, though. iconv can convert text files from one encoding to another, but for that to work on docx files, you'll need to unzip them first (and zip them up again afterwards).



--




reply via email to

[Prev in Thread] Current Thread [Next in Thread]