[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [aspell-devel] Affix compression
From: |
Kevin Atkinson |
Subject: |
Re: [aspell-devel] Affix compression |
Date: |
Sat, 23 Jul 2005 18:24:57 -0600 (MDT) |
On Sat, 23 Jul 2005, Gregory Maxwell wrote:
> I was wondering if anyone has looked at using libjudy
> (http://judy.sourceforge.net/) for storing words with aspell?
No I have not. And it is unlikely I will unless someone with a deep
understanding of how Aspell work's internally (hint study readonly_ws.cpp
and suggest.cpp) convince me it will improve on the exiting
implementation.
> Libjudy provides a number of sparse array data structures which
> provide very fast lookups, because they are cache aware, and
> reasonable memory efficiency. There is a function in the libjudy
> package that provides a string indexed array which is quite space
> efficient because it is prefix compressed.
>
> I don't have a standalone metaphone encoder handy, but just passing
> /usr/dict/words on my pentium M laptop into judy sl gives 0.304
> uS/word lookups using only 10mbyte of core, which is only 2x the size
> of the file.
These numbers are meaningless to me. In need a comparison with Aspell's
exiting implementation.
> It would be easy to provide code with this datastructure which quickly
> found the longest match and all other entries of the same match
> length, perhaps something which would be useful in aspell as well...
Maybe. Since you thought of the idea the burden of proof is on you. I
need concrete examples of how using Judy will improve on Aspell's
existing behavior.
--
http://kevin.atkinson.dhs.org