[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
mhfixmsg character set conversion
From: |
Steven Winikoff |
Subject: |
mhfixmsg character set conversion |
Date: |
Thu, 03 Feb 2022 22:42:21 -0500 |
I routinely use mhfixmsg to clean up incoming messages, using this command
in a shell script invoked through procmail:
mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 \
-reformat -fixcte -fixboundary -noreplacetextplain \
-fixtype application/octet-stream -noverbose -file - \
-outfile $destination < $source
This usually does what I expect, but the other day I received a message
with these characteristics:
- mhlist reports the following structure:
msg part type/subtype size description
72 multipart/alternative 45K
1 text/html 42K
2 text/plain 1501
- the top level of the incoming message has this header (before
mhfixmsg):
Content-Type: multipart/alternative; boundary=01266[...]
- the alternative parts have these headers:
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=iso-8859-1
and
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=iso-8859-1
- after mhfixmsg, the top-level header is unchanged, as expected; the
alternative part headers are changed to
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="UTF-8"
and
Content-Transfer-Encoding: 8bit
Content-Type: text/html; charset=iso-8859-1
...but after conversion from iso-8859-1 to UTF-8, the output file is
mangled.
For reference, here's a section of the quoted-printable encoding from the
original message:
Veuillez ne pas r=E9pondre au pr=E9sent courriel. Il a =E9t=E9 g=E9n=E9r=E9=
automatiquement, nous ne pourrons pas y donner suite.
This should decode to the following (represented in UTF-8):
Veuillez ne pas répondre au présent courriel. Il a été généré
automatiquement, nous ne pourrons pas y donner suite.
(all in one line, but split here for readability).
...but mhfixmsg turns that into
Veuillez ne pas répondre au présent courriel. Il a été généré
automatiquement, nous ne pourrons pas y donner suite.
(also all in one line, but split here for readability).
Not that I care very much about this particular boilerplate sentence :-/,
but the message contained a lot of other text that I do care about, all of
which was mangled in the same way.
My questions are then:
1) Is this a bug in mhfixmsg, or am I just using it incorrectly?
2) If the former, is there further information I can supply to help track
this down, or further tests I can conduct on the message in question?
3) ...or if the latter, what am I doing wrong, and what should I be doing
instead?
Thanks,
- Steven
--
___________________________________________________________________________
Steven Winikoff |
Montreal, QC, Canada | Aleph-null bottles of beer on the wall,
smw@smwonline.ca | Aleph-null bottles of beer...
http://smwonline.ca |
- Re: In Memoriam: Norman Z. Shapiro 1932-2021, Ken Hornstein, 2022/02/01
- Re: In Memoriam: Norman Z. Shapiro 1932-2021, Jon Steinhart, 2022/02/01
- mhfixmsg character set conversion,
Steven Winikoff <=
- Re: mhfixmsg character set conversion, David Levine, 2022/02/04
- Re: mhfixmsg character set conversion, Ken Hornstein, 2022/02/04
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/04
- Re: mhfixmsg character set conversion, David Levine, 2022/02/04
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/04
- Re: mhfixmsg character set conversion, Ken Hornstein, 2022/02/04
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/04
- Re: mhfixmsg character set conversion, David Levine, 2022/02/05
- Re: mhfixmsg character set conversion, David Levine, 2022/02/06
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/06