[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] EAI?
From: |
Ken Hornstein |
Subject: |
Re: [Nmh-workers] EAI? |
Date: |
Sun, 09 Aug 2015 01:58:18 -0400 |
>Should nmh try to get out in front with email address
>internationalzation (EAI)? See resources below.
I've thought about what it would take.
>From the MUA perspective, IIUC, it relies on native support on the
>host to handle unencoded UTF-8 addresses. Would nmh support just be a
>matter of 1) not encoding addresses (controlled by a switch) in
>outgoing messages and 2) when showing a message, indicating that an
>address couldn't be displayed?
I think it's slightly more complicated than that (see below).
>Does anyone have experience using it? Gmail supports it, according
>to the article below.
I think the lack of people with such an address means it's pretty uncommon
still, right?
Lyndon writes later:
>Since we require a Posix environment, that means utf8 locale support must
>be in place, thus all the OS bits are there waiting to be used.
>
>But to do this properly we really need to overhaul the code base to
>process everything internally as utf8. That's not a trivial task, but we
>have to do it, sooner or later.
Here are my unformed thoughts:
- It's not so easy to deal with characters that aren't in your native locale
using the POSIX API; xlocale make this easier, but it's a pain.
- A super-brief scan suggests to me that SMTPUTF8 support is not widespread
at this point. But that will no doubt change.
- Right now our address parser will reject stuff that contains 8-bit
characters; we need to fix that. In fact, we need to throw out that
address parser and get a new one; I made some progress on that using
flex and bison.
- It's unclear to me how much UTF-8 verification a MUA is supposed to deal
with; are we, for example, supposed to check for overlong UTF-8 encodings?
Valid UTF-8 sequences?
- I do not believe we have to process everything internally as UTF-8, but I
could be persuaded I'm wrong. The real kicker is the format engine;
right now we sort-of cheat a lot. %(decode) basically does a one-stop
decoding and conversion to the native character set. This has a lot of
advantages, but also means we need to sit down and decide what the
format engine is really supposed to be working on; for example, is the
format engine supposed to be dealing with strings pre or post RFC-2047
decoding?
- SMTPUTF8 looks relatively straightforward to implement, at least.
- I would rather not make ICU or IDN a build requirement, but it may be
unavoidable.
--Ken