[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gnu-libiconv] MS-ANSI query
From: |
Peter Flynn |
Subject: |
[bug-gnu-libiconv] MS-ANSI query |
Date: |
Mon, 27 Apr 2009 13:49:38 +0100 |
User-agent: |
Thunderbird 2.0.0.21 (X11/20090318) |
I'm writing an RSS feed from a LISTSERV list fed by internal (Exchange)
mail. The messages could have any content transfer encoding (most are
Base64 or quoted-printable, and are handled by mewdecode) and any
charset, but in all cases the resulting text is passed through iconv to
make it UTF-8.
This doesn't work for messages sent in the undistinguished "8BIT" which
appears from examination to be Windows ANSI in all local cases so far
(so I'm prepared to live with that assumption).
The management of the feed is done in a Bash shell script and gawk under
RHEL5 and the resulting UTF-8 XML is passed to Cocoon for feed
generation. In the gawk script, it passes the text through the pipeline
<stuff> | iconv -f <charset> -t utf8 >more stuff
For the 8BIT messages I have tried MS-ANSI but this fails: an example
error message is
iconv: illegal input sequence at position 1
for an input stream where the second byte (pos 1) is 0xF3 (lowercase
letter o with acute accent). Testing the other ANSI values for the -f
parameter, I find:
ANSI_X3.4 same error message
ANSI_X3.4-1968 same error message
ANSI_X3.4-1986 same error message
ANSI_X3.110 no error but converts to UTF8 0xC3 0xB0 (lowercase eth)
ANSI_X3.110-1983 (same as ANSI_X3.110)
I'm not sufficiently familiar with the internals of ANSI 8-bit encodings
to know if this is correct (and I therefore have something else
undefined) or if it's a bug.
///Peter
- [bug-gnu-libiconv] MS-ANSI query,
Peter Flynn <=