[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
LYNX-DEV ISO-8859-2 HTML entities for HTMLDTD.c
From: |
Hynek Med |
Subject: |
LYNX-DEV ISO-8859-2 HTML entities for HTMLDTD.c |
Date: |
Sun, 9 Mar 1997 18:43:05 +0100 (MET) |
I have added ISO-8859-2 entities like č to Klaus' charset patches.
I hope I got it all right - is &udie; an 'u' with diaeresis and is 'a'
with double accents &adouble; ? Where can I get these standards?
It's strange anyway. ć works, while &tacute; doesn't. Ą
works, ą doesn't. Either I forgot something (like sort it) or
there's something wrong with the code..
An example of this:
Document with ą Ą Č ů &udie; produces:
ą ¡ Č ů ü
(Aogon and udie are OK, others not.)
Trace outpus shows:
SGML: Unknown entity Aogon so far, checking extra... SGML: Unknown entity
Ccaron so far, checking extra... SGML: Unknown entity Ccaron SGML:
Unknown entity uring so far, checking extra... SGML: Unknown entity uring
SGML: Unknown entity udie so far, checking extra...
Attached to this mail is my patch, relative to lynx2-7 subdirectory (i.e.
cd to lynx2-7, then patch). I can produce other entities (like ISO-8859-3
or still missing entities from ISO-8859-2 I don't know HTML name for (like
DOUBLE ACUTE ACCENT, MULTIPLICATION SIGN, DIVISION SIGN, DOT ABOVE..)) - I
have a perl script to do that from Unicode's 8859-x.TXT, I only need to
know relationships like "X WITH CIRCUMFLEX -> ◯"..
Hynek
--
Hynek Med, address@hidden
--- WWW/Library/Implementation/HTMLDTD.c.orig Sun Mar 9 14:45:10 1997
+++ WWW/Library/Implementation/HTMLDTD.c Sun Mar 9 15:10:02 1997
@@ -154,14 +154,101 @@
/* UC_entity_info structure is defined in SGML.h. */
static CONST UC_entity_info extra_entities[] = {
- {"Aogon", 0x0104}, /* TEST */
- {"ccaron", 0x010d}, /* c with caron */
+
+/* Klaus' tests */
+
{"comma", 44}, /* TEST */
{"lrm", 8206}, /* left-to-right mark */
{"rlm", 8207}, /* right-to-left mark */
- {"zcaron", 0x017e}, /* z with caron */
{"zwnj", 8204}, /* zero width non-joiner */
{"zwj", 8205}, /* zero width joiner */
+
+/* ISO-8859-2 entities added by address@hidden
+ I'm not sure if &udie; is right for 'u' with
+ diaeresis, and whether 'a' with double accents
+ is really &adouble;
+*/
+
+ {"Aogon", 0x0104}, /* A with ogonek */
+ {"Lstrok", 0x0141}, /* L with stroke */
+ {"Lcaron", 0x013d}, /* L with caron */
+ {"Sacute", 0x015a}, /* S with acute */
+ {"Scaron", 0x0160}, /* S with caron */
+ {"Scedil", 0x015e}, /* S with cedilla */
+ {"Tcaron", 0x0164}, /* T with caron */
+ {"Zacute", 0x0179}, /* Z with acute */
+ {"Zcaron", 0x017d}, /* Z with caron */
+ {"Zdot", 0x017b}, /* Z with dot above */
+ {"aogon", 0x0105}, /* a with ogonek */
+ {"lstrok", 0x0142}, /* l with stroke */
+ {"lcaron", 0x013e}, /* l with caron */
+ {"sacute", 0x015b}, /* s with acute */
+ {"scaron", 0x0161}, /* s with caron */
+ {"scedil", 0x015f}, /* s with cedilla */
+ {"tcaron", 0x0165}, /* t with caron */
+ {"zacute", 0x017a}, /* z with acute */
+ {"zcaron", 0x017e}, /* z with caron */
+ {"zdot", 0x017c}, /* z with dot above */
+ {"Racute", 0x0154}, /* R with acute */
+ {"Aacute", 0x00c1}, /* A with acute */
+ {"Acirc", 0x00c2}, /* A with circumflex */
+ {"Abreve", 0x0102}, /* A with breve */
+ {"Adie", 0x00c4}, /* A with diaeresis */
+ {"Lacute", 0x0139}, /* L with acute */
+ {"Cacute", 0x0106}, /* C with acute */
+ {"Ccedil", 0x00c7}, /* C with cedilla */
+ {"Ccaron", 0x010c}, /* C with caron */
+ {"Eacute", 0x00c9}, /* E with acute */
+ {"Eogon", 0x0118}, /* E with ogonek */
+ {"Edie", 0x00cb}, /* E with diaeresis */
+ {"Ecaron", 0x011a}, /* E with caron */
+ {"Iacute", 0x00cd}, /* I with acute */
+ {"Icirc", 0x00ce}, /* I with circumflex */
+ {"Dcaron", 0x010e}, /* D with caron */
+ {"Dstrok", 0x0110}, /* D with stroke */
+ {"Nacute", 0x0143}, /* N with acute */
+ {"Ncaron", 0x0147}, /* N with caron */
+ {"Oacute", 0x00d3}, /* O with acute */
+ {"Ocirc", 0x00d4}, /* O with circumflex */
+ {"Odouble", 0x0150}, /* O with double acute */
+ {"Odie", 0x00d6}, /* O with diaeresis */
+ {"Rcaron", 0x0158}, /* R with caron */
+ {"Uring", 0x016e}, /* U with ring above */
+ {"Uacute", 0x00da}, /* U with acute */
+ {"Udouble", 0x0170}, /* U with double acute */
+ {"Udie", 0x00dc}, /* U with diaeresis */
+ {"Yacute", 0x00dd}, /* Y with acute */
+ {"Tcedil", 0x0162}, /* T with cedilla */
+ {"racute", 0x0155}, /* r with acute */
+ {"aacute", 0x00e1}, /* a with acute */
+ {"acirc", 0x00e2}, /* a with circumflex */
+ {"abreve", 0x0103}, /* a with breve */
+ {"adie", 0x00e4}, /* a with diaeresis */
+ {"lacute", 0x013a}, /* l with acute */
+ {"cacute", 0x0107}, /* c with acute */
+ {"ccedil", 0x00e7}, /* c with cedilla */
+ {"ccaron", 0x010d}, /* c with caron */
+ {"eacute", 0x00e9}, /* e with acute */
+ {"eogon", 0x0119}, /* e with ogonek */
+ {"edie", 0x00eb}, /* e with diaeresis */
+ {"ecaron", 0x011b}, /* e with caron */
+ {"iacute", 0x00ed}, /* i with acute */
+ {"icirc", 0x00ee}, /* i with circumflex */
+ {"dcaron", 0x010f}, /* d with caron */
+ {"dstrok", 0x0111}, /* d with stroke */
+ {"nacute", 0x0144}, /* n with acute */
+ {"ncaron", 0x0148}, /* n with caron */
+ {"oacute", 0x00f3}, /* o with acute */
+ {"ocirc", 0x00f4}, /* o with circumflex */
+ {"odouble", 0x0151}, /* o with double acute */
+ {"odie", 0x00f6}, /* o with diaeresis */
+ {"rcaron", 0x0159}, /* r with caron */
+ {"uring", 0x016f}, /* u with ring above */
+ {"uacute", 0x00fa}, /* u with acute */
+ {"udouble", 0x0171}, /* u with double acute */
+ {"udie", 0x00fc}, /* u with diaeresis */
+ {"yacute", 0x00fd}, /* y with acute */
+ {"tcedil", 0x0163}, /* t with cedilla */
};
#endif /* EXP_CHARTRANS */
- LYNX-DEV ISO-8859-2 HTML entities for HTMLDTD.c,
Hynek Med <=