lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Unicode-marking, &c


From: David Woolley
Subject: Re: [Lynx-dev] Unicode-marking, &c
Date: Fri, 27 Feb 2009 08:25:32 +0000
User-agent: Thunderbird 2.0.0.19 (X11/20081209)

Thomas Dickey wrote:


Lynx assumes the document charset is ISO-8859-1 if it's not given.
(That was the rule for some time - for HTML - perhaps we're not
discussing HTML anymore).

It hasn't been the rule for around a decade; HTML 4.0 overrides HTTP/1.1 for the text/html media type. Failure to specify a charset is an error. Browsers must not assume a default, but may use heuristics. (In practice, one of the heuristics is to use a default!)

Without a charset, therefore, it is reasonable, but not required, for a browser to assume that something that starts with a UTF-* byte order sequence is UTF-*.

Also, outside of the USA/Western Europe, it became quite common practice to use tools that set windows-1252, etc., but then actually send the local encoding. People in those regions had no problems, as they locked their browsers into, say GB2312, and ignored the charset, completely. Nowadays, there is a mix of UTF and GB2312, so that strategy may no longer work.

Setting that to UTF-8 makes it display properly.

0xFE is a valid ISO-8859-1 code, as your terminal emulator shows...



--
David Woolley
Emails are not formal business letters, whatever businesses may want.
RFC1855 says there should be an address here, but, in a world of spam,
that is no longer good advice, as archive address hiding may not work.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]