emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rmail-mbox branch


From: David De La Harpe Golden
Subject: Re: Rmail-mbox branch
Date: Mon, 08 Sep 2008 18:55:08 +0100
User-agent: Mozilla-Thunderbird 2.0.0.16 (X11/20080724)

Richard M. Stallman wrote:

> HTML is made of plain ASCII text.

(1) No it isn't, at least not necessarily.
http://www.w3.org/TR/REC-html40-971218/charset.html#encodings

While HTML numeric/entity refs _allow_ most all >7-bit chars to
make it through an ascii channel, IF someone uses them, AFAIK their use
is _not_ mandated, you can e.g. just write your html in utf-8 and use ☺
characters directly.  i.e. they're a facility a bit like those C
trigraphs (entity refs have a range of other uses in SGML/XML land).


(2) text/html can have a declared charset.  It _may_ be US-ASCII,
but it isn't necessarily the case that it is.
http://www.ietf.org/rfc/rfc2854.txt

"Because of the availability within HTML itself for using character
entity references, documents that use a wide repertoire of characters
may still be represented using the US-ASCII charset and transported
without encoding.  However, transport of text/html using a charset
other than US-ASCII may require base64 or quoted-printable encoding
for 7-bit channels."

When I wrote a mostly-ascii Unicode HTML mail with a capable mailer (not
that I make a habit of HTML mail, bleurgh), it got sent as:

Content-Type: text/html;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable


At some point, after pasting in a big bunch of unicode squiggles
(presumably some heuristic on proportion of non-7-bit-ascii chars or
encoded length) , it decided to switch to:

Content-Type: text/html;
  charset="utf-8"
Content-Transfer-Encoding: base64

















reply via email to

[Prev in Thread] Current Thread [Next in Thread]