[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev chartrans to CJK-like display (was: stopping when viewing a
From: |
Klaus Weide |
Subject: |
Re: lynx-dev chartrans to CJK-like display (was: stopping when viewing a site) |
Date: |
Thu, 26 Aug 1999 04:16:39 -0500 (CDT) |
On Thu, 26 Aug 1999, Henry Nelson wrote:
> Leonid:
> > FTP Directory: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/
> [...]
> Klaus probably understands better than I do what I want, but I definitely
> do NOT want CJK translation tables to become a part of Lynx.
Why not? :)
There would be advantages if Lynx could really translate between arbitrary
charsets including CJK, say "Japanese" (whatever charset) <--> Unicode.
For example, with the advent of utf-8 capable xterm, usets of that should
be able to see Japanese text mixed with all kinds of other text, without
switching, even on the same screen. Given the right fonts of course -
I understand they exist. (I have not tried any of this.)
But don't worry, I think there is no imminent danger of someone adding full
CJK-charset <--> unicode translation...
Anyway, that's not what we were talking about; I think we all understood
that. I think Leonid's point was that that FTP directory might help you
find out the CPNNN name for what your system uses. (If there is one.
And if you really want to know.)
> Klaus used
> the term "CJK-with-.tbl-file approach". This means, as best as I can
> explain it, leave all CJK up to the Sato/Asada CJK handlers, and use a .tbl
> file for the decimal 160-255 characters that are NOT CJK (basically anything
> in the us-ascii extended _range_). The Sato/Asada CJK handlers do their
> job VERY well. First they decide what character set is being thrown at
> them, e.g., in Japanese there is euc-jp, sjis-jp and iso-2022-jp. This
> is the hardest part of the job. The character sets can be jumbled
> together, and in most cases the Sato routines will ferret out what is what.
> THEN they do a translation, for the multibyte characters, to the unique
> character set the user chooses.
I would prefer to use a different term here instead of translation, I would
rather call it re-encoding. Just to separate it from the the other, table-
driven translation.
> Hope I haven't totally confused you. As simply as I can put it:
> allow table translation for (all)* single-byte characters (16x16)
> send all multibyte characters to be handled by CJK routines
I know what you mean, although the terminology you use is a bit too limiting:
There is no reason to restrict this a priori to "single-byte characters".
Rather it should apply to everything that is recognized as a "Unicode
character" (i.e. translatable by lynx to a U+nnnn value), whether it
originally came in as iso-8859-1, koi8-r, utf-8, windows-1252 bytes,
or as character entities or NCRs or whatever. That excludes the CJK
"multibyte characters" (of course 8-bit bytes are treated as parts of
multibyte characters only in some modes), for everything else it's up to
whoever-makes-the-table. what gets translated.
> *(all that can be easily determined or reasonably assumed to be)
That may depend on your system of course...
Anyway we are not talking about any new feature, just some tweaks to
enable a mechanism that is already there.
Getting this right for *7-bit* characters that need translation may
be not worth it (I am thinking about the '\' that is not a backslash).
The chartrans guts in UCdomap support translation of all bytes, but
the assumption that 7-bit==ASCII is made in multiple places.
Did you ever try any of this *before* nearly all of the separate
translation tables ('old-style') in LYCharSets.c were eliminated?
It should have been a more straightforward change then.
Klaus