[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mhfixmsg character set conversion
From: |
Steven Winikoff |
Subject: |
Re: mhfixmsg character set conversion |
Date: |
Sat, 12 Feb 2022 01:48:36 -0500 |
>I would do this if you haven't already:
>1. download nmh HEAD, build, and install somewhere
>2. move your $(mhpath +)/mhn.defaults
>3. move your profile and create one with just a Path: entry
>4. run the "mhfixmsg -file original_copy -out -" from 1. and see if the
> output looks good or bad
I just tried this, and a couple of other things, but only after installing
par 1.53.0 from source and using that to replace the AUR binary. Here's
what I learned:
1) Replacing par does indeed fix one of the three failed tests. I can
send you the details, but I seem to recall that you already have them
from Valdis Klētnieks; please let me know if I should forward them
anyway.
2) After running make install, the newly built mhfixmsg produces correct
output. But so does nmh-1.7.1 mhfixmsg when compiled without my patch.
3) Step (3) above was the key, and it turned out that I was being misled
by this .mh_profile entry:
mhshow-show-text/html: html_to_text %F | cat -
...where html_to_text is a shell script that basically just runs this
command:
elinks -force-html -dump -dump-charset utf-8 ${html}
Removing this profile entry causes the message to be displayed
correctly -- both the original, unmodified version, and the one that
was saved after being converted by my patched version of nmh-1.7.1
mhfixmsg. That's pretty conclusive evidence that I'd been looking
in the wrong place all along. :-(
The man page for elinks describes -dump-charset as follows:
-dump-charset (alias for document.dump.codepage)
Codepage used when formatting dump output.
Interestingly, when I restored the mhshow-show-text/html .mh_profile
entry and modified my shell script to run elinks without this option,
I still saw the same doubly encoded output.
So next I tried passing the character set to my script as follows:
mhshow-show-text/html: html_to_text %{charset} %F
...and changed the script to use the provided character set rather
than forcing utf-8:
elinks -force-html -dump -dump-charset $1 ${html}
This failed differently. Instead of rendering the message with '�'
marking undisplayable characters, it used '*' instead. Somehow, I
don't consider that to be much of an improvement. :-/
...so clearly I need to replace elinks in my html_to_text script, and doing
that will solve the problem that prompted this discussion, leaving the
following questions:
1) What's the best replacement for elinks?
2) Should I replace my 1.7.1 installation by the version I just built?
Basically I'm asking what benefits the current snapshot has over
1.7.1, and how far away the next numbered release might be.
3) How can I guarantee that messages will be saved with quoted-printable
or base64 parts decoded, without patching mhfixmsg to deal with
messages in which the decoded text would be more than 998 characters
long?
I used the current mhfixmsg with the test message I've been using
throughout this discussion, with this command line:
/tmp/nmh/root/bin/mhfixmsg \
-decodeheaderfieldbodies utf-8 -decodetext binary \
-decodetypes text -textcharset UTF-8 -reformat \
-fixcte -fixboundary -noreplacetextplain \
-fixtype application/octet-stream \
-verbose -file $source -outfile $destination
...and that resulted in these headers after decoding:
- for the text/plain part:
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="UTF-8"
- for the text/html part:
Content-Transfer-Encoding: binary
Content-Type: text/html; charset=iso-8859-1
That raises some further questions:
- Why wasn't the text/html part converted to utf-8?
- Regardless of the answer to the previous question, after a
message has been refiled (and assuming I'm not planning to
resend it to anyone), is there a practical difference between
binary and 8bit encoding?
- Why are the headers of the decoded message identical to those
of the input, despite the use of -decodeheaderfieldbodies?
(...and yes, the unmodified version of the message does contain
some encoded headers that my decode_headers program found and
decoded; mhfixmsg appears not to have done so).
Thanks,
- Steven
--
___________________________________________________________________________
Steven Winikoff | "'Somebody, SOMEBODY
Montreal, QC, Canada | Has to, you see.'
smw@smwonline.ca | Then she picked out two Somebodies.
http://smwonline.ca | Sally and me."
| - Dr. Seuss
- Re: mhfixmsg character set conversion, (continued)
- Re: mhfixmsg character set conversion, David Levine, 2022/02/08
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/09
- Re: mhfixmsg character set conversion, David Levine, 2022/02/09
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/09
- Re: mhfixmsg character set conversion, David Levine, 2022/02/09
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/11
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/12
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/12
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/10
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/11
- Re: mhfixmsg character set conversion,
Steven Winikoff <=
- Re: mhfixmsg character set conversion, David Levine, 2022/02/12
- Re: mhfixmsg character set conversion, Ken Hornstein, 2022/02/12
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/12
- Re: mhfixmsg character set conversion, David Levine, 2022/02/12
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/13
- Re: mhfixmsg character set conversion, David Levine, 2022/02/13
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/13
- Re: mhfixmsg character set conversion, Robert Elz, 2022/02/13
- Re: mhfixmsg character set conversion, David Levine, 2022/02/14
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/14