help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

g-client: character coding problem


From: Joseph Fahey
Subject: g-client: character coding problem
Date: Sun, 13 May 2007 16:55:29 +0200
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.94 (gnu/linux)


Hello all,

I am trying to use T. V. Ramah's g-client interface to the Google API,
in particular as an interface to Blogger.

http://emacspeak.blogspot.com/2007/03/emacs-client-for-google-services.html#cooliris

G-client, and more specifically gblogger.el, works great for me... as
long as I stick to ASCII characters. Any accented characters from the
iso-latin-1 subgroup show up incorrectly. I've tried a lot of
different things to alter the coding system, mostly by setting
everything I can to utf-8-unix.

I think I have found the problem, but am not sure how to solve it.
gblogger.el uses xsltproc via shell-command-on-region to prepare a
blog, before sending the buffer using curl. It appears that the
characters are coming back malformed. Here is the function in
g-utils.el:

(defsubst g-xsl-transform-region (start end xsl)
  "Replace region by result of transforming via XSL."
  (declare (special g-xslt-program))
  (let ((coding-system-for-write 'utf-8))
  (shell-command-on-region
   start end
   (format "%s %s - %s"
           g-xslt-program xsl (g-xslt-debug))
   'replace)))

If I run the following code (the elisp in the middle of this xml) in a
utf-8 buffer (-u) I get malformed characters (the accents in the
"content" part):

<entry xmlns='http://www.w3.org/2005/Atom'>
  <generator 
url="http://purl.org/net/emacs-gblogger/";>http://purl.org/net/emacs-gblogger/</generator>
  <author> <name>Me </name> </author>
  <title mode="escaped" type="text/html">être </title>
  <content type='xhtml'>
    <div xmlns="http://www.w3.org/1999/xhtml";>
<!--content goes here -->
<p>Être ou ne pas être ? 

  (g-xsl-transform-region 
  (point-min) (point-max) "~/elisp/g-client/blogger-edit-post.xsl")

</p>
    </div>
  </content>
</entry>

I get the same results if I change the code to:

 (let ((coding-system-for-write 'utf-8-unix))
  (g-xsl-transform-region
   (point-min) (point-max) "~/elisp/g-client/blogger-edit-post.xsl")) 

Here is what I get if I do "C-h C":

Coding system for saving this buffer:
  Not set locally, use the default.
Default coding system (for new files):
  u -- utf-8-unix (alias of mule-utf-8-unix)

Coding system for keyboard input:
  nil
Coding system for terminal output:
  u -- utf-8 (alias of mule-utf-8)

Defaults for subprocess I/O:
  decoding: 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)

  encoding: 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)


Priority order for recognizing coding systems when reading files:
  1. iso-latin-1 (alias: iso-8859-1 latin-1)
  2. windows-1252 (alias: cp1252)
  3. mule-utf-8 (alias: utf-8) ... etc.

I suspect that the problem is coming from the defaults for subprocess
I/O, but I'm not sure how to change that.

So... here I am at wit's end. Any ideas would be much appreciated.

thanks

Joe

PS: this is on Linux with GNU Emacs 22.0.94.2. Here is the output from locale:

LANG=fr_FR.UTF-8@euro
LC_CTYPE=fr_FR.UTF-8
LC_NUMERIC="fr_FR.UTF-8@euro"
LC_TIME="fr_FR.UTF-8@euro"
LC_COLLATE="fr_FR.UTF-8@euro"
LC_MONETARY="fr_FR.UTF-8@euro"
LC_MESSAGES=C
LC_PAPER="fr_FR.UTF-8@euro"
LC_NAME="fr_FR.UTF-8@euro"
LC_ADDRESS="fr_FR.UTF-8@euro"
LC_TELEPHONE="fr_FR.UTF-8@euro"
LC_MEASUREMENT="fr_FR.UTF-8@euro"
LC_IDENTIFICATION="fr_FR.UTF-8@euro"
LC_ALL=


reply via email to

[Prev in Thread] Current Thread [Next in Thread]