bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6252: Emacs does not implement URL (aka "percent") decoding correctl


From: José A . Romero L .
Subject: bug#6252: Emacs does not implement URL (aka "percent") decoding correctly.
Date: Sun, 23 May 2010 01:46:54 +0200

On May 18, 20:14, Xah Lee <xah...@gmail.com>  wrote:

> is there emacs lisp function that decode the url percent encoding?
> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
> should become
> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
> that's a EN DASH (unicode 8211, #o20023, #x2013).
> I know there's a
>   (require 'gnus-util)
>  gnus-url-unhex-string
> but that just unhex, and generate gibberish if the url contain unicode
> chars.
(...)

Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
that is an important hole you have found there. The standard requires
that all unreserved characters be encoded/decoded as UTF8 bytes. Even
though the encoding part looks OK (in url-util.el), the decoding does
not go that last mile to interpret the decoded bytes as UTF-8.

Until a proper implementation is  done, I guess you could work around
the problem with something like this:

    (decode-coding-string
     (apply 'unibyte-string
            (string-to-list
             (url-unhex-string "http://en.wikipedia.org/wiki/Sylvester
%E2%80%93Gallai_theorem")))
     'utf-8)

(yes, it's ugly as hell but hey, it's free ;])

I've just sent this very message as a bug report to the Emacs team.

Cheers,
-- 
José A. Romero L.
escherdragon@gmail.com
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]