freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Segmenting and indexing Japanese


From: Azerty Traductions
Subject: [Freecats-Dev] Segmenting and indexing Japanese
Date: Fri, 11 Jul 2003 13:49:27 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; fr-FR; rv:1.4) Gecko/20030624

Hi all!

As you know, the main issue in segmenting a Japanese text results from it's morphology. When I studied Japanese at the University, the layout of the documents was very similar to the european style: punctuation and spaces were used to improve the readability for beginners (a sort of "Japanese for dummies"). In real life, things appear to be completely different. No spaces, no commas, text written from right to left... A nightmare for newbies!

Dictionary-based methods achieve excellent results with general subjects. What about highly technical translations? ... It just can't work.

The solution proposed by the Advanced Telecommunications Research Institute (http://www.atr.co.jp/index_e.html) is as follows: A morphologic analysis based on "decisions tree" (eg. 'na' is used at the end of some adjectives, then the word is probably an adjective...). Using an array of suffixes and a list of constraints with statistics, it is then possible to segment a text...

There are many conferences and summits about segmenting and indexing Japanese and Chinese texts. It would be great if we could get in touch with some lecturers!

Julien
(I only have a very basic knowledge of Japanese, sorry for not being more specific!)









reply via email to

[Prev in Thread] Current Thread [Next in Thread]