nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility


From: Ralph Corderoy
Subject: Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Date: Mon, 17 Oct 2016 23:22:19 +0100

Hi Ken,

> > Valid UTF-8 and valid GB2312 can share the same sequences,
> > especially if it's just the odd `£' or `拢` in ASCII text.
>
> It was just a suggestion, not one I was particularly crazy about ...
> but not all arbitrary 8-bit sequences are valid UTF-8.

Oh, agreed.

> And it looks like for GB2312 (using the EUC-CN encoding, right?) it
> would be harder, but there are certainly invalid sequences for GB2312.

Yep.  But there's a lot of valid sequences for both that look like each
other.  UTF-8 for U+00a3, that `£', is U+62e2, `拢', if the UTF-8 0xc2
0xa3 is treated as (EUC-CN) GB2312.

    $ printf '\x00\xa3' |
    > iconv -f ucs-2be -t utf-8 |
    > iconv -f gb2312 -t ucs-2be |
    > hd
    00000000  62 e2                                             |b.|
    00000002
    $

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]