[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev reading sjis docs [was Re: lynxcgi problem]
From: |
Hataguchi Takeshi |
Subject: |
Re: lynx-dev reading sjis docs [was Re: lynxcgi problem] |
Date: |
Fri, 31 Dec 1999 22:12:13 +0900 (JST) |
On Thu, 30 Dec 1999, Klaus Weide wrote:
> On Thu, 30 Dec 1999, Hataguchi Takeshi wrote:
> > On Tue, 28 Dec 1999, Henry Nelson wrote:
> >
> Hataguchi Takeshi wrote:
> > > > By the way, I'm wondering ASSUME_CHARSET doesn't work for Japanese
> > > > as expected now as you've ever wrote.
> > > > Do you know the relationship between ASSUME_CHARSET and
> > > > "kanji code", which can be changed by ^L with SH_EX?
> > >
> > > ASSUME_CHARSET is turned off for CJK, as far as I know. Our LAN service
> > > is very unstable right now, so I cannot try to search the archives for
> > > you,
> > > but look in the "http://www.flora.org/lynx-dev/html/month1097" archives,
> > > and grep for "did something happen to."
>
> > Thank you very much. Now I see ASSUME_CHARSET is off for CJK.
> > But I've not understood why it's off. I'll continue to check archives.
>
> Can you please describe in detail what you mean with "is off".
> What did you try, what did you expect, and what did actually happen?
I might be confusing. I'm sorry that no one wrote "is off" in the
thread. I found this description in lynx.cfg and thought
if CJK mode is on, then ASSUME_CHARSET has no meaning.
| # Raw (CJK) mode
| #
| # Lynx normally translates characters from a document's charset to display
| # charset, using ASSUME_CHARSET value (see below) if the document's charset
| # is not specified explicitly. Raw (CJK) mode is OFF for this case.
I hadn't try anything when I wrote the last mail.
Now I tried some files attached to this mail.
metaEUC.html
Documents in euc-jp with META tag
metaSJIS.html
Documents in shift_jis with META tag
metaSJIS2.html
Documents in shift_jis with wrong META tag (x-sjis)
nometaEUC.html
Documents in euc-jp without META tag
nometaSJIS.html
Documents in shift_jis without META tag
I got the result from first two files as expected.
I knew the charset specified by META tag is valid as you wrote.
I got bad result from the third file metaSJIS2.html,
which declares charset as x-sjis.
I know x-sjis isn't in IANA's character sets.
But there are some pages declaring charset as x-sjis,
because Netscape had added x-sjis and x-euc-jp to the charset
and allowed to use them in the META tag independently
before Shift_JIS and EUC-JP were added in IANA charset.
So I feel happy if Lynx allows x-sjis and x-euc-jp.
# I refered this page, but unfortunately it's in Japanese.
# http://www.bekkoame.or.jp/~poetlabo/WWW/charset.html
I tried nometaEUC.html by setting ASSUME_CHARSET as euc-jp and
DISPLAY_CHARSET as Japanese (EUC-JP), but I got bad result,
which is as same as the result by setting ASSUME_CHARSET as iso-8859-1.
I wanted the same result as one from metaEUC.html.
I got also bad result from nometaSJIS.html by setting
ASSUME_CHARSET as shift_jis and DISPLAY_CHARSET as Japanese (EUC-JP).
It seems ASSUME_CHARSET has no effect in this experiments.
> I am not aware of ASSUME_CHARSET being explicitly turned off for CJK.
> It's just that ASSUME_CHARSET, basically, has the equivalent effect of
> a META tag with a charset (only with a lower priority); or possibly has
> less effect (no call to HText_setKcode - see below). If an explicit
> charset in a META tag has no effect for CJK, then it is no surprise if
> ASSUME_CHARSET has no effect, either.
META tag has effect but ASSUME_CHARSET doesn't as I wrote above.
>
> Well - I expect that ASSUME_CHARSET does have an effect if
> (a) Display Character Set is a CJK character set, and ASSUME_CHARSET points
> to a non CJK charset (possibly only with raw/CJK toggle state being off?)
> or
> (b) Display Character Set is a non-CJK character set, and ASSUME_CHARSET
> points to a CJK charset.
I tried. But I can't read many Japanese documents.
What kind of situations do I have to use these settings?
> > > My *hunch* is that ASSUME_CHARSET would not offer much to help Lynx render
> > > Japanese documents. How can you assume?
> >
> > My idea is almost same as Hiroyuki's manual overriding switch.
> > We usually set it as "Japanese (Auto Detect)" and sometimes
> > set it as "Japanese (Shift_JIS)" or "Japanese (EUC)"
> > when Lynx fails to detect document character set.
> >
> > I think ASSUME_CHARSET is a something which should play this role.
> > Anyway I'll try to find the reason ASSUME_CHARSET is off for CJK.
>
> The first question should be why the CJK magic doesn't listen to any
> sorts of charset at all. Whether the best way for toggling is via
> the ASSUME_CHARSET mechanism or some other mechanism can then be decided
> later.
I thought it's ASSUME_CHARSET. But now I can't understand how
Lynx does/should process ASSUME_CHARSET at all.
> > Thanks. It seems there are no differences between output of them.
> > It seems <META ... CONTENT="text/html;charset=hogehoge"> has no effect
> > for Japanese documents.
>
> See especially
> <http://www.flora.org/lynx-dev/html/month1097/msg00110.html>
> and
> <http://www.flora.org/lynx-dev/html/month1097/msg00151.html>
> from the thread that Henry pointed out.
Thank you. I see the META tag happened to have no effect in Henry's
examples.
# Oops! Henry uses only x-sjis and x-euc-jp as charset.
# I tried again by replacing x-sjis to Shift_JIS and x-euc-jp to EUC-JP
# and got the same result.
> The code fragment quoted in the first message is still present in the
> most recent Lynx code. Just search for "if (ch == ' ') {". What it
> means, according to my understanding when I wrote that message (I have
> no re-examined this with the current code, but I asusme the effect is
> still the same): First, we go to some trouble to set text->kcode in
> HText_setKcode() (GridText.c), based on the charset in a META. But
> then HText_appendCharacter() goes and almost immediately cancels the
> effect. All it takes is a space (' ') character.
This strategy is useful for documents which has more than two charsets
like Henry's. But I think they are quite rare.
Especially in case of charset is declared explicitly, I think it's not
useful.
> (Possibly HText_setKcode() should be called from more places, not just
> LYHandleMETA in LYCharUtils.c, but also from MTMIME.c and HTFile.c, at
> least; but given that it has no real effect, it's no surprise that
> those calls have never been added.)
I hope it's also called from such places.
> > # There are some Japanese documents which declare WRONG character set.
> > # If Lynx processs the META tag strictly, we can't get proper output
> > # from such wrong pages. I'm wondering this is one reason that Hiroyuki
> > # added manual overriding function.
> > # In the case of NN and IE, it seems they don't processs the META tag
> > # strictly. I think that's the reason why there exists wrong documents
> > # in Japan. :-<
> >
> > > > If this is right, I think ASSUME_CHARSET should work properly.
> > > > # "Japanese (Auto Detect)" should be added in the list, if needed.
> > > > Don't you agree with me, Henry?
>
> That is a user interface question that should be deferred until later.
> But it would make more sense to have
>
> x-autodetect_jp # or similar
> in addition to
> shift_jis
> euc-jp
>
> in the Assumed document character set list than having "Japanese (Auto
> Detect)" in the Display character set list. One is for input, the other
> for output, and it is the character encoding of the input that would be
> "detected", not the state of the terminal display. (I guess this is
> basically what you mean when you think about ASSUME_CHARSET?)
Right! Thank you.
--
Takeshi Hataguchi
E-mail: address@hidden
samples.tar.gz
Description: Binary data
- Re: lynx-dev reading sjis docs [was Re: lynxcgi problem], (continued)
- Re: lynx-dev reading sjis docs [was Re: lynxcgi problem], Henry Nelson, 1999/12/27
- Re: lynx-dev reading sjis docs [was Re: lynxcgi problem], Hataguchi Takeshi, 1999/12/28
- Re: lynx-dev reading sjis docs [was Re: lynxcgi problem], Henry Nelson, 1999/12/29
- Re: lynx-dev reading sjis docs [was Re: lynxcgi problem], Hataguchi Takeshi, 1999/12/29
- Re: lynx-dev reading sjis docs [was Re: lynxcgi problem], Hataguchi Takeshi, 1999/12/31
- Re: lynx-dev reading sjis docs [was Re: lynxcgi problem],
Hataguchi Takeshi <=