2. request for help about an issue with multibyte character encoding
====================================================================
There is an issue with multibyte characters that appear in the input
as unescaped, multibyte encoded characters (not as XML entities, as XML
entities multibyte characters are simply substituted correctly). I
looked for an example with a character encoding specified in the first
line of the XML feed like
<?xml version="1.0" encoding="utf-8"?>
and found one here:
http://www.openscreencast.de/blog/rss.xml
[...]
The problem with this feed is, that it contains raw unicode characters
that must be converted to utf-8 before they can be properly inserted
in the target buffer.
Attached patch does this by explicitely decoding new entries according
to their detected character encoding.
Btw.: Helpful introduction to the topic gives
The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)
by Joel Spolsky
http://www.joelonsoftware.com/articles/Unicode.html