emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: encoding and content-length for url-http.el


From: Kenichi Handa
Subject: Re: encoding and content-length for url-http.el
Date: Thu, 16 Jun 2005 16:05:50 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI)

In article <address@hidden>, "Mark A. Hershberger" <address@hidden> writes:
[...]
> I'm not using url-dav.el -- I'm using xml-rpc.el which I maintain.

> However, to eliminate the reliance on external code, I've pulled the bit
> from xml-rpc.el that makes the call to post to a weblog hosted on
> Blogger.com:

>         (let ((url-debug t)) (setq url-request-data "<?xml version=\"1.0\" 
> encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn
>  from emacs with 
> patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>")
        
[...]
>                   result))))))

> Without the patch that I supplied, this results in a server error:
> "unexpected end of file found"

> With the patch, it works perfectly.  The result can be seen at
> http://emacs-weblogger.blogspot.com/

In the code above, you set url-request-data to a multibyte
string.

All non-ascii characters in "Iñtërnâtiônàlizætiøn" are
iso-8859-1 and Emacs internally represents each character in
iso-8859-1 in 2-byte.  That means string-bytes on
url-request-data returns, by chance, the same byte length of
the result of encoding it by utf-8.

(string-bytes "Iñtërnâtiônàlizætiøn")
== (length (encode-coding-string "Iñtërnâtiônàlizætiøn" 'utf-8))
== 27

That's why your change to url-http.el works for the above
case.  But, that is just coincidence.  If the string
contains, for instance, an Ethiopic character, it doesn't
work.

What I still don't know is what value url-request-data
should have?

If it should be an already encoded string (and make it
callers responsibility to pre-encode a string), just using
`length' as now is ok.  And you can use this kind of code:

<         (let ((url-debug t)) (setq url-request-data (encode-coding-string 
"<?xml version=\"1.0\" 
encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn
 from emacs with 
patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>"
 'utf-8))

Please try it after cancelling your change.

If it should be a multibyte string, the correct way to
calculate Content-length: is to use this code:
  (length (encode-coding-string "Iñtërnâtiônàlizætiøn" 
                                url-request-coding-system))
with your patch for introducing url-request-coding-system.

Anyway, this change of yours:

> +     (set-process-coding-system connection
> +                                (detect-coding-string url-request-data t)
> +                                url-request-coding-system)

is bad as Stefan wrote.  The second arg must be `binary',
and we should decode the received data according to the
contents (perhaps by parsing the header and detecting what
charset is specified and falling back to Emacs' code
detection routine).

---
Kenichi Handa
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]