Thomas Dickey wrote:
9j/9k look at this line
<?xml-stylesheet href="/ve.css" type="text/css"?>
and because it's saying that it's xml without specifying a charset,
assumes it's UTF-8.
That line says "Link a stylesheet to this XML document". It doesn't
specify that it is XML at all (although you might imply that from that
information).
The fact that there is no XML prolog at all would imply that it was
UTF-8 (since that is the default), but the document is being served as
text/html, so tag soup slurping rules apply.
(Incidentally, Appendix C (a non-normative, fuzzily written part of the
spec that is a requirement of serving XHTML as text/html ) says that the
style declaration should not be present, since it is a processing
instruction: http://www.w3.org/TR/xhtml1/#C_1 )
HTTP defaults for ISO-8859-1 for text/* documents which don't specify
otherwise.
However, I see the following
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1" />
Ah, the wonder that is "http-equiv".
Here is what the spec has to say on the subject:
"This attribute may be used in place of the name attribute. HTTP
servers use this attribute to gather information for HTTP response
message headers."
Yup, it is for servers, not clients. Guess what though? Other clients do
pay attention to it, and a significant number of sites depend on it.
Basically, the whole issue is underspecified and suffers from browsers
paying attention to things they should ignore, recovering in interesting
ways, and having wonderful conflicts when there are multiple ways to
specify the same thing, some of which apply differently depending on the
HTTP headers served up.
I haven't looked at it in detail, but the HTML5 draft is likely to give
the safest (i.e. most consistent with everything else) approach to
dealing with character encoding (at least in text/html):
http://www.w3.org/TR/html5/parsing.html#determining