[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] nmh architecture discussion: format engine character s
From: |
Ken Hornstein |
Subject: |
Re: [Nmh-workers] nmh architecture discussion: format engine character set |
Date: |
Tue, 11 Aug 2015 14:07:32 -0400 |
>- Message should be stored in their original forms. I.e. The
> character encoding transformation should only be done for
> display/access purposes.
Completely, 100% agree here.
>- I think using a character encoding library is unavoidable. Is iconv()
> sufficient?. If UTF-8 is to be used as the normalized encoding
> format, a library is needed that can transform the various encodings
> into it, and likely from it. Maybe it is not as big an issue as it
> was in the past, but not everyone was sold on Unicode. In my
> mail-related project, I had users that preferred they local character
> encoding formats over anything Unicode related.
Weeeeel .... not exactly. It's not just a transformation issue; if it
was, iconv() would be fine.
The issue in the format engine is: we need to know about things, like is
' ' a space? (the format engine does space compression) If the strings
are UTF-8, we can't use isspace() on it. We can't even use iswspace(),
because that requires the locale to be set to an UTF-8 locale. So we
need a library that can process UTF-8, regardless of the locale setting.
> Character encoding choices can get quite political.
>
> If a library is adopted, then users have full control of what encoding
> they prefer.
Well, I was thinking that the locale would control the display/encoding
character set, like it does now.
>- As for parsing message headers, make it a configurable option
> on what the default character encoding should be. UTF-8 could be the
> default (which is fortunately is US-ASCII compatible).
>
> Real-world note: I have encountered emails that actually use a
> non-ASCII default encoding for message header data. Messages in
> non-English locale. Technically, these message are not conformant to
> the RFCs, but such messages actually exist. Hence, in my project, I
> support an option that specifies what the default encoding is.
While I understand where you're coming from, back before EAI those
messages were invalid according to the RFCs. Now the RFCs have changed
and those messages are defined as being UTF-8, full stop, no exceptions.
I understand the need to define a default character set for messages
which don't meet the RFCs, but it feels wrong to me to allow the user to
override the interpretation of a message which is now legal. I welcome
discussion in this area.
>- I think it is perfectly reasonable to leverage the current locale
> setting to determine defaults, but one should be able to explicit
> override such defaults via .mh_profile and command-line options.
Well, a user can already override that by changing locale environment
variables. To me that seems like the right mechanism; you can do that
on the command line, with shell wrappers, whatever.
>Warning message(s) should be generated when character data is lost due
>to conversion.
It's unclear to me where those messages should go, and it doesn't seem like
anyone else does that.
--Ken
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, (continued)
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Earl Hood, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Jon Steinhart, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ralph Corderoy, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ralph Corderoy, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Earl Hood, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Anthony J. Bentley, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Earl Hood, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Lyndon Nerenberg, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set,
Ken Hornstein <=
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ralph Corderoy, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Lyndon Nerenberg, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Jon Steinhart, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Paul Fox, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Lyndon Nerenberg, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Paul Fox, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, norm, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Earl Hood, 2015/08/12