dotgnu-pnet
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pnet-developers] [bug] resgen produces junk for valid PO files


From: Bruno Haible
Subject: [Pnet-developers] [bug] resgen produces junk for valid PO files
Date: Thu, 18 Dec 2003 15:42:43 +0100
User-agent: KMail/1.5

Using pnet-0.6.0.

The input PO file is an ISO-8859-1 encoded PO file, valid as you can see from
"msgfmt -c". The output .resources file is invalid: it has a headers saying
it's UTF-8 encoded but it's in fact ISO-8859-1 encoded!

$ uudecode <<\EOF
begin 644 de.po
M;7-G:60@(B(*;7-G<W1R("(B"B)0<F]J96-T+4ED+59E<G-I;VXZ(&AE;&QO
M(#`N,%QN(@HB4F5P;W)T+4US9VED+4)U9W,M5&\Z(&)U9RUG;G4M9V5T=&5X
M=$!G;G4N;W)G7&XB"B)03U0M0W)E871I;VXM1&%T93H@,C`P,RTP.2TR,B`Q
M,#HQ-BLP,C`P7&XB"B)03RU2979I<VEO;BU$871E.B`R,#`S+3$Q+3$Y(#$V
M.C,U*S`Q,#!<;B(*(DQA<W0M5')A;G-L871O<address@hidden;"!%:6-H=V%L9&5R
M(#QK94!S=7-E+F1E/EQN(@HB3&%N9W5A9V4M5&5A;address@hidden;6%N(#QD94!L
M:2YO<F<^7&XB"B)-24U%+59E<G-I;VXZ(#$N,%QN(@HB0V]N=&5N="U4>7!E
M.B!T97AT+W!L86EN.R!C:&%R<V5T/4E33RTX.#4Y+3%<;B(*(D-O;G1E;G0M
M5')A;G-F97(M16YC;V1I;F<Z(#AB:71<;B(*(E!L=7)A;"U&;W)M<SH@;G!L
M=7)A;',],CL@<&QU<F%L/2AN("$](#$I.UQN(@H*;7-G:60@(DAE;&QO(%=O
?<FQD(2(*;7-G<W1R(")(address@hidden"$B"@``
`
end
EOF
$ msgfmt -c de.po && echo ok
ok
$ resgen de.po hello.de.resx
$ head -1 hello.de.resx
<?xml version="1.0" encoding="utf-8"?>
$ iconv -f utf-8 -t utf-8 hello.de.resx > /dev/null 
iconv: illegal input sequence at position 1794

And when this invalid PO file is fed to resgen again, the result is:

$ resgen hello.de.resx de.po
$ cat de.po
msgid ""
msgstr ""
"Project-Id-Version: hello 0.0\n"
"Report-Msgid-Bugs-To: address@hidden"
"POT-Creation-Date: 2003-09-22 10:16+0200\n"
"PO-Revision-Date: 2003-11-19 16:35+0100\n"
"Last-Translator: Karl Eichwalder <address@hidden>\n"
"Language-Team: German <address@hidden>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=ISO-8859-1\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

msgid "Hello World!"
msgstr "Hall\U001A3A25n Welt!"

Here you can see a Unicode character that is out of range! (Unicode
ends at \U0010FFFF; anything above \U00110000 is invalid.) It seems
the UTF-8 parser is missing some checks against invalid UTF-8 input.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]