freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Freecats-Dev] japanese etc.


From: Henri Chorand
Subject: RE: [Freecats-Dev] japanese etc.
Date: Tue, 15 Jul 2003 10:17:06 +0200

Hi all,

> > For instance, for Japanese, Jean-Christophe Helary seems the perfect
> > candidate, and Julien Poireau, who studied this language for some
> > time, and of course Yves Champollion (who also speaks Japanese
> > at home) could help him in this task. I believe Julien is eager to start
> > a topic about Japanese, he's been thinking about it for some time.
>
> well, i am not sure i am a candidate for anything but, well, if i know
> what the task is about i can consider it :)

You must be better than I'm for Japanese, it's not possible otherwise ;-)

To put things simply, though: let's consider that we are starting from a
rather Western-language centric approach, which must be (at least) refined
and maybe completely re-thought, in order to achieve better results than
present tools (which are not very good yet, if I remember you well).

> right now i have a few books on japanese linguistics here at home.
> i left a lot of them in france, especially all the ones about computer
> processing of the japanese language. note that i never read them
> before coming though :(

Well, maybe you can have all of them sent to you ;-)

> (...)
> besides, i have absolutely no idea about the basics of tm theory,
> the only thing i know is roughly that the bigger the processed
> string is the more accurate will be the match, see how rough it
> is ?

I can at least tell this: A match between two sentences / strings is higher
if these are closer to each other (sorry for the English), and they are
closer if they have more words (or portions of words) in common and/or less
one sentence-specific words and if the sequence of words in common is more
the same.

Now, that's what I call "Le degré zéro de la TAO", may the shades of Roland
Barthes forgive me.

Of course, the above implies that we know how to identify words, even in
Japanese - Julien explained to me how this could prove difficult. This is
why I sill say one more thing - we might choose to process strings at the
character level, whether they are from Chinese origin or simply phonetic.

Also, the above implies we know how to determine portions of words, but this
seems less relevant (or better, absolutely irrelevant) for Japanese.

I think I'll stop saying anything else on this tricky subject of Japanese,
or I'm going to make a fool of myself. Whatever.


> i'd like to have some background in the area before starting anything
> and if anyone of you can give me pointers to online articles and/or
> affordable books (i accept second hand donations, or copies eventually
> :) i'll gladly read them. btw, such a list could be on the project page
> for would-be participants...

>From the sources you mention, you certainly are more up-to-date than I'll
ever be as far as Japanese is concerned.

Julien may send you some links he recently identified. At least, it seems a
number of teams are working on topics of interest for us :-)

Oh, and a last idea: to begin things smoothly, you may briefly sum up what
(to your knowledge) the existing CAT tools know and don't know to do with
Japanese.


Cheers,

Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]