[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [aspell-devel] Thoughts on using aspell for Indian language checking
From: |
Kevin Atkinson |
Subject: |
Re: [aspell-devel] Thoughts on using aspell for Indian language checking |
Date: |
Sun, 12 Nov 2006 06:38:54 -0700 (MST) |
Some quick comments below. More latter.
On Sat, 11 Nov 2006, address@hidden wrote:
Hello,
Over the last couple of months, I have been working with some people
who are implementing an Indian language search engine, at
http://raftaar.org (the online keyboard on the site presently works only
on Microsoft IE only, but will soon be ported to Firefox, and other
open-source browsers). The site uses aspell to provide suggestions for
mis-spelled words.
There are still significant issues in making aspell work properly with
Indian languages. Here are some thoughts based on the work that has been
done on aspell 0.60.3:
1. I would like to volunteer to work on writing a proper C++ interface
to aspell. This would include a public interface that exposes only
the normal spellchecking facilities in a class, as well as a testing
interface that provides access to internals like the scores, weights,
and even costs for computing edit distance. I already have something
that makes the testing part available, but it is rather hacked up. If
we can discuss what might be an interface that can get accepted into
aspell, I would be glad to work on it.
I assume you are already aware that Aspell has a C interface. I do not
have a C++ for one very good reason: binary compatibility. If anything I
would like to improve the exiting C interface to support the extended
functionally. Not create a new one.
2. I have done some more work on making bindings to aspell available in
other programming languages, and, at present, Python, Perl and C#
bindings are available, through SWIG. What I would like to do is
first build a C++ class-based interface, and use that as a basis
for a consistent interface across all languages. Besides the bindings,
this would include example programs for using them, as well as GUI
implementations in at least one language that provide a front-end to
spellchecking, as well as to the testing framework.
This is better done via the C interface.
3. I see some major stumbling blocks in making aspell work properly with
Indian languages. Perhaps the most significant one is that in Indian
languages it makes sense to deal with syllables (a clump of consonants,
possibly with vowel modifiers), rather than with individual characters.
Thus, for example, edit distance operations should work on syllables.
This is a little difficult, though not impossible, to do with the
present, non-Unicode, internal functioning of aspell. One way would be
to have a function inside score_list() that reconverts to Unicode,
and works on syllables. However, it seems silly to do this, rather
than having Unicode throughout. I am aware of Kevin's arguments for
retaining the 128-character space used by aspell, but do not see a
^^^ that 256.
clean mechanism for handling complex scripts within such a framework.
Comments on this would be appreciated.
Please explain the issue to me in detail or give me some good links.
Off hand with out knowing the full details of the problem the answer would
be to store them internally using syllables, rather then full characters.
If necessary make use of the Unicode normalization code to convert input
from the full characters to syllables and back again.
- [aspell-devel] Thoughts on using aspell for Indian language spellchecking, gora, 2006/11/12
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking,
Kevin Atkinson <=
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, gora, 2006/11/12
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, Ethan Bradford, 2006/11/12
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, Kevin Atkinson, 2006/11/13
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, gora, 2006/11/13
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, Gora Mohanty, 2006/11/16
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, Ethan Bradford, 2006/11/16
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, Kevin Atkinson, 2006/11/17
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, Gora Mohanty, 2006/11/17
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, Kevin Atkinson, 2006/11/17
- Re: [aspell-devel] Thoughts on using aspell for Indian language checking, Kevin Atkinson, 2006/11/13