[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev hyphenation (was tech. question: translating strings)
From: |
Vlad Harchev |
Subject: |
Re: lynx-dev hyphenation (was tech. question: translating strings) |
Date: |
Tue, 7 Sep 1999 14:58:49 +0500 (SAMST) |
On Mon, 6 Sep 1999, Klaus Weide wrote:
> >[...]
> > - is this correct?). Pattern matching is implemented as finite-state machine
> > in libhnj (the transitions are calculated when reading hydict). Apparently,
> > if
> > two languages use different keycodes, it's possible to concatenate hydicts
> > to
> ^^^^^^^^
> > get the hyrules that will hyphenate two languages at the same time -
>
> Why are you talking about keycodes? That doesn't seem to make much sense,
> what kind of "keycodes" do you mean? X11 keycodes? Is the program bound
> to *that*? (I hope not.)
Of course I incorrectly use "keycodes" instead of "character codes" (but you
could guess that).
> That sentence would make more sense to me if you replaced "keycode" with
> "character", with the understanding that "character" is meant in the
> ISO10646/Unicode sense.
I don't know what ISO10646 means.
> > so I
> > afraid, english phrases like StarDivision will be hyphenated incorrectly if
> > hydict for French is loaded since AFAIK French and English use latin-1
> > encoding (at least the keycodes of both lanugages are not disjoint).
>
> Well of course phrases in one language wil be hyphenatted incorrectly if
> patterns for a different language are used. That should be no surprise!
> (The only way around wrong hyphenation of proper names etc. from another
> language that I can think of would be to have specific exceptions, in your
> case in the French patterns.)
Yes.
> The fact that the dictionary allows you to combine patterns for two (or
> more) languages IF AND ONLY IF their letter repertoires are completely
> non-overlapping should be viewed as a hack that can be useful in some
> situations, nothing more. You can use one set of (combined) patterns for
> Russian+English, but it won't work for Russian+German or Russian+Ukrainian
> (assuming the two languages need different patterns), and you cannot use
> the same combined set for Ukrainian+English (under the same assumption) or
I don't understand - why "the same" (if to be understood directly - then
there is a trivial answer). Seems you want to underline that "language
information" must be associated with hyrules (otherwise I don't understand
what is your conclusion). If you mean that "language information" - I mean
"what language the hyrules are applicable to" - must be provided with hyrules
- then there of course will be comment in the hyrules telling what language
it's applicable for - and it's a user's responsibility to note and understand
and remember that and not to select german hyrules for english texts. But I
assume that the user is not brain-damaged from here. And I don't want to make
foolproof protection for hyrules stuff (at least that can't be achieved using
unicode in any of it's incarnations, at least for German and English).
> for German+English or for Russian+German.
>
> In fact, you cannot express that last one at all, not to mention
Do you really mean "last one" - ie R+G, not te G+E?
I don't know what letter codes does G use (IMO only few - not more than 5 of
non-latin characters).
> combinations like Russian+Greek, unless you go to transform and apply the
> patterns in some representation of UCS (since no 8-bit charset I know has
> all the necessary letters combined). That combining effect is only useful
> for X+English (and one or two other languages of all human languages in
> place of English), since only for English (and those other one or two) is
> the necessary character repertoire completely part of the common 8-bit
> charsets. (Well not even that; if the hyphenation patterns remain bound
I assume that hyphenation rules for, say, russian doesn't contain any latin
letter. So I don't understand what "X+English" you are talking about.
> to a specific charset, you cannot put patterns for the correct spelling
> of such useful words as naОve or blasИ or brassiХre in a combined Russian+
It's obvious that E+G hyphenation rules even if written in unicode won't work
together correctly since there is no "captial english letter f" and
"capital german letter f" .
I see these words first time - may be they are not so useful :)? WHat do they
mean?
Not placing these two words won't hurt too much the hyphenation of the entire
E+R document.
> English dictionary.)
>[...]
And IMO you didn't see TeX (and libhnj) hyphenation rules for russian - they
don't contain any latin characters (in spite of all latin characters are
present in any Cyr. encoding). The same for any other languages - the hyrules
are build by scanning a lot of text in that language.
Please make conclusion for all your text starting after my "Yes". I can't
deduce your conclusion (or it's just "use unicode!").
Best regards,
-Vlad
- Re: lynx-dev tech. question: translating strings to different charsets, (continued)
- Re: lynx-dev tech. question: translating strings to different charsets, Klaus Weide, 1999/09/01
- Re: lynx-dev tech. question: translating strings to different charsets, Vlad Harchev, 1999/09/02
- Re: lynx-dev tech. question: translating strings to different charsets, Klaus Weide, 1999/09/02
- Re: lynx-dev tech. question: translating strings to different charsets, Vlad Harchev, 1999/09/03
- Re: lynx-dev tech. question: translating strings to different charsets, Klaus Weide, 1999/09/03
- Re: lynx-dev tech. question: translating strings to different charsets, Vlad Harchev, 1999/09/04
- Re: lynx-dev tech. question: translating strings to different charsets, Vlad Harchev, 1999/09/06
- Re: lynx-dev tech. question: translating strings to different charsets, Klaus Weide, 1999/09/06
- Re: lynx-dev tech. question: translating strings to different charsets, Vlad Harchev, 1999/09/06
- lynx-dev hyphenation (was tech. question: translating strings), Klaus Weide, 1999/09/06
- Re: lynx-dev hyphenation (was tech. question: translating strings),
Vlad Harchev <=
- Re: lynx-dev tech. question: translating strings to different charsets, Klaus Weide, 1999/09/06
- Re: lynx-dev tech. question: translating strings to different charsets, Vlad Harchev, 1999/09/06
- lynx-dev hyhenation (was tech. question: translating strings), Klaus Weide, 1999/09/07
- Re: lynx-dev hyhenation (was tech. question: translating strings), Vlad Harchev, 1999/09/07
- lynx-dev hyphenation (was tech. question: translating strings), Klaus Weide, 1999/09/06
- Re: lynx-dev hyphenation (was tech. question: translating strings), Vlad Harchev, 1999/09/06
- Re: lynx-dev hyphenation (was tech. question: translating strings), Doug Kaufman, 1999/09/06
- Re: lynx-dev hyphenation (was tech. question: translating strings), Vlad Harchev, 1999/09/06
- Re: lynx-dev hyphenation (was tech. question: translating strings), Klaus Weide, 1999/09/06
- Re: lynx-dev hyphenation (was tech. question: translating strings), Vlad Harchev, 1999/09/06