viewmail-info
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [VM] displaying text/html with utf-8 via w3m


From: Ralf Fassel
Subject: Re: [VM] displaying text/html with utf-8 via w3m
Date: Thu, 29 Jan 2015 10:57:51 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

* Uday Reddy <address@hidden>
| Ralf Fassel writes:
>
| > The umlaut-a has been replaced by a space.  If I run the text manually
| > through w3m, the ouput still contains the utf-8 character, just the
| > HTML-markup is gone.  But after inserting in the Presentation buffer the
| > umlaut is changed to a space.
>
| The first thing to check would be whether it is a problem with Emacs.  What
| happens if you put the w3m output in a file and visit it in Emacs?
>
| If it is not a problem with Emacs, then it must be a bug in VM.  Please file
| a bug report along with a sample message.

Ok, now I found the time to dig into this.  It is a mismatch between
encoding written to the w3m process (emacs'
default-process-coding-system) and what w3m is told to expect (-I).

The message itself has
    Content-Type: text/plain; charset="UTF-8"
    Content-Transfer-Encoding: Quoted-Printable

Now VM prepares the message for w3m by calling
  'vm-mime-display-internal-text/html'
which does
  (vm-mime-transfer-decode-region layout start end)
  (vm-mime-charset-decode-region charset start end)

So now the region contains utf-8.

Then w3m is called via
   'vm-mime-display-internal-w3m-text/html'
which uses 'shell-command-on-region'.

However, shell-command-on-region is documented as

    By default, the input (from the current buffer) is encoded using
    coding-system specified by `process-coding-system-alist', falling
    back to `default-process-coding-system' if no match for COMMAND is
    found in `process-coding-system-alist'.

In my setting process-coding-system-alist is nil, but
'default-process-coding-system' is (iso-latin-9-unix . iso-latin-9-unix).  
So what is really sent to w3m is latin-9, but w3m is told to process the
input as UTF-8.  This replaces the latin-9 non-ASCII chars by " ".

If I temporarily set
  (default-process-coding-system '(utf-8 . utf-8))
then the Presentation buffer contains the correct decoded mail message.

IMHO 'vm-mime-display-internal-w3m-text/html' should temporarily adjust
the default-process-coding-system to match what w3m is told to expect.

Sample mail message available on request...

HTH
R'


reply via email to

[Prev in Thread] Current Thread [Next in Thread]