emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Orgmode] Re: org-feed XML entities and character encoding


From: Michael Brand
Subject: [Orgmode] Re: org-feed XML entities and character encoding
Date: Fri, 13 Aug 2010 21:03:52 +0200
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4

Hi David

On 10-08-13 17:59 , David Maus wrote:
2. request for help about an issue with multibyte character encoding
====================================================================

There is an issue with multibyte characters that appear in the input
as unescaped, multibyte encoded characters (not as XML entities, as XML
entities multibyte characters are simply substituted correctly). I
looked for an example with a character encoding specified in the first
line of the XML feed like
<?xml version="1.0" encoding="utf-8"?>
and found one here:
http://www.openscreencast.de/blog/rss.xml
[...]

The problem with this feed is, that it contains raw unicode characters
that must be converted to utf-8 before they can be properly inserted
in the target buffer.

Attached patch does this by explicitely decoding new entries according
to their detected character encoding.

Btw.: Helpful introduction to the topic gives

The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)

by Joel Spolsky

http://www.joelonsoftware.com/articles/Unicode.html

Thank you very much for your patch, it resolves this issue with
org-feed.el like expected. I tested your patch with the two feeds
http://www.openscreencast.de/blog/rss.xml  (declared utf-8)
and
http://pod.drs.ch/world_music_special_mpx.xml  (not declared utf-8)
described more by me earlier and a dozen other feeds, all with
character encoding utf-8.

Michael



reply via email to

[Prev in Thread] Current Thread [Next in Thread]