aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: International support


From: Samphan Raruenrom
Subject: Re: International support
Date: Mon, 11 Jan 1999 21:46:29 +0700

Kevin Atkinson wrote:
> Samphan Raruenrom wrote:
> > In Thai, we don't put spaces between words at all so
> > the same situation happends naturally.
> > Typical Thai word-segmentation algorithm (which usually
> > do spelling check also) use maximal-match backtracking
> > algorithm with trie word list(s).
> > My implementation is at http://www.thai.net/libinthai/
> > IBM Classes for Unicode implementation is at
> > http://www.ibm.com/java/education/boundaries/boundaries.html
> Ok so how do you detect bonduries of unknown or misspelled words.

IBM ICU's algorithm describe in the above URL is :-
: If we exhausted our possibilities without finding 
: a valid sequence of words, it either means there's
: an error in the text, or the text includes a word 
: that isn't in the dictionary. In either case, we restore
: the set of break positions that matched the most 
: characters, advance one character past where the
: mismatch occurred in that sequence, and start over 
: from there. This works pretty well: usually only
: one or two boundary positions around the error 
: are in the wrong place.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]