nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] Troublesome messages


From: Ralph Corderoy
Subject: Re: [Nmh-workers] Troublesome messages
Date: Sun, 15 Oct 2017 01:05:56 +0100

Hi Jon,

> > > Don't know if there's anything that can be done about this given
> > > the nature of unicode and all, but I've been getting a lot of spam
> > > recently that looks like this:
> > >
> > > 代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致
>
> Not saying that it's not unicode, just that it makes a mess of my
> window.  Using mate-terminal on linux, utf-8 local.
...
> Screws up the display after message 5.

I poked about that email a bit on this UTF-8 xfce4-terminal.

    $ scan -width 0 -forma '%{from}\n%{subject}\n%{body}' .
    =?GB2312?B?wdbPyMn6?= <address@hidden>
    =?GB2312?B?tPq/qreixrE=?=
    ??Ʊ???????Żݡ?????֤?󸶿13651402207Ҷ???? ???һ??
    $

The `%{body}' output is nmh trying to take the GB2312 body as UTF-8,
struggling with many of the bytes, producing a `?' for them instead, but
some GB2312 bytes do happen to form a valid UTF-8 sequence so the odd
`Ʊ' gets invented.

    $ scan -width 0 -forma '%(decode{from})\n%(decode{subject})' .
    林先生 <address@hidden>
    代开发票
    $

`%(decode)' works.

    $ mhstore -outfile -
    ������Ʊ�������Żݡ�����֤�󸶿13651402207Ҷ���� ΢��һ��
    storing message 5 to stdout
    $

This time, nmh gets out the way and just flings the bytes at the TTY.
xfce4-terminal spots they're not valid and its U+FFFD `�' results; `Ʊ'
is still there.

    $ mhstore -outfile - | iconv -f gb2312
    storing message 5 to stdout
    代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致
    $

It's valid GB2312 according to iconv(1) that's converted it to UTF-8.
uniq(1) says that's identical to the line you give above.

    $ mhshow | sed '$! d'
    代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致
    $

And that's the same line again, so nmh can do it too.

I think historically there's been various problems with sbr/fmt_scan.c,
e.g. its cpstripped(), and that could have included putting out partial
UTF-8, I don't recall.  You could capture the bytes from the scan that
messes up and send them here.  I've been using

    $ scan -version
    scan -- nmh-1.7-RC3 1.7-RC3-4-g3dfc049a built 2017-09-26 14:24:31 +0000 on 
orac

Also, try xterm instead.  I find it handy when another terminal's
quality is in doubt.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]