[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Lynx-dev] ISO-8859-8-I
From: |
Owen Leibman |
Subject: |
[Lynx-dev] ISO-8859-8-I |
Date: |
Tue, 21 Feb 2012 13:07:56 -0800 (PST) |
The W3C recommends (see
http://www.w3.org/TR/html4/struct/dirlang.html#bidi88598) the use
of character set ISO-8859-8-I rather than ISO-8859. Although Lynx does
recognize ISO-8859-8 as a valid encoding,
it does not recognize the character set ISO-8859-8-I (nor ISO-8859-8-E),
and is treating the encoding as ISO-8859-8-1 if so specified.
This is true whether the character set is specified in a meta tag (using either
Content-type or Charset),
or in an http header. Test pages to demonstrate the problem are at:
http://www.dayenu.com/lieberman.iso88598i.htm (8859-8-i handled incorrectly)
http://www.dayenu.com/lieberman.iso88598.htm (8859-8 handled correctly)
Although there is code to recognize the 2 encodings in LYCharSets.c, that code
seems ineffective
in recognizing the character set however the site specifies it. On the other
hand, it seems sufficient, in all cases,
to modify UCdomap.c to treat ISO-8859-8-I and ISO-8859-8-E as aliases of
ISO-8859-8.
A diff to accomplish this follows:
--- src/UCdomap.c.orig 2012-02-21 05:11:03.519199979 -0800
+++ src/UCdomap.c 2012-02-21 05:14:10.120125290 -0800
@@ -1559,6 +1559,10 @@ int UCGetLYhndl_byMIME(const char *value
if (!strncasecomp(value, "iso", 3) && !StrNCmp(value + 3, "8859", 4)) {
return getLYhndl_byCP("iso-", value + 3);
}
+ if (!strcasecomp(value, "iso-8859-8-i") ||
+ !strcasecomp(value, "iso-8859-8-e")) {
+ return UCGetLYhndl_byMIME("iso-8859-8");
+ }
#if !NO_CHARSET_euc_jp
if (!strcasecomp(value, "x-euc-jp") ||
!strcasecomp(value, "eucjp")) {
- [Lynx-dev] ISO-8859-8-I,
Owen Leibman <=