emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnus should accept UTF8 even if UTF-8 is standard


From: Stephen J. Turnbull
Subject: Re: gnus should accept UTF8 even if UTF-8 is standard
Date: Wed, 22 Oct 2008 11:34:17 +0900

Eli Zaretskii writes:

 > > > Perhaps something like `canonicalize-coding-system-name' would be good.
 > > 
 > > That implies that the return value would be a string, not the coding
 > > system itself.  I suggest we return the coding system (or nil), not
 > > just the name.
 > 
 > What I meant is that, instead of returning a _string_, which is the
 > name of a coding system, it is better to return a _symbol_ of that
 > coding system.

Of course.  My point is that the symbol is the name, and therefore
"canonicalize-coding-system-name" is a reasonable name for this
function.

If it weren't for the conflict with XEmacs, which still needs
`get-coding-system' to return a coding system object, I'd be perfectly
happy using that.

 > > AIUI, the point of the function is to guess what people who don't
 > > know what they're doing are trying to express (and to provide some
 > > interactive convenience to people who do know what they're doing).
 > 
 > Agreed, but in most cases the argument will be a valid MIME charset.

Except when Richard<wink> is typing, and surely we all consider that
an important use case?  Aside from Richard's expressed preference for
a harmless convenience, the presence or absence of one or more hyphens
is something the various standards disagree about:

 > The case of "UTF8" is an exception.

Well, no, I think it is not.  AFAIK only one of "iso-8859-1" and
"iso8859-1" is registered, but Emacs uses the former exclusively, and
X11 only the latter (in XLFDs).  Both are acceptable to iconv.  (And
the ISO standards actually use "ISO 8859/1" which isn't even
acceptable to glibc iconv!)

 > And even in this exceptional case, I understand that "UTF8" came
 > from some charset= header.  That is why I suggested
 > coding-system-for-charset.

Well, the MIME nomenclature is seriously broken.  A substantial
minority of the things it denotes "charsets" are not "character sets"
in any sense.

 > I don't mind coding-system-for-mime-charset, either, if that was
 > your point.

That's the worst of several suggestions, as this mapping is not
limited to MIME charsets, but is useful for coding systems in general,
as the usage of hyphens in their names has no rhyme nor reason.  Is it
"KOI8-R" or "KOI-8R"?  That one confused me, at least, for a while.

 > (In Emacs 23+, the original Mule meaning of "charset" will fade
 > out.)

That would be sad.  While I agree that UTF-8 will fairly quickly
become universal for current text documents, I don't expect the vast
amount of legacy archives to be converted any time soon (some will be
converted at the time of converting to new media, but human beings
being what they are I expect that for a couple centuries some
bureaucrats will just make bit-level copies ;-).  Emacs should be the
premier application for reading those!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]