[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Aspell-user] aspell_speller_add_to_personal() doesn't accept hyphen
From: |
Michael Bunk |
Subject: |
Re: [Aspell-user] aspell_speller_add_to_personal() doesn't accept hyphens |
Date: |
Mon, 15 Aug 2005 08:36:05 +0530 |
User-agent: |
KMail/1.5.4 |
Hi Gary,
thanks for your reply, though you didn't make it too easy for me :)
> Why are you using such an early version? I had issues with that
> aspect of aspell some time ago. I thought they were resolved.
I have tried 0.60.3 now, but the behaviour is the same.
> You can see what I did in this project:
> http://sourceforge.net/projects/descdatadiary/
> It is a windows project. It does not try to do a Unix install,
> but you may find what I did in speller_impl.cpp interesting.
I have seen that you modified
aspell-0.60.1-win32/modules/speller/default/speller_impl.cpp
by implementing 3 new functions:
int aspell_speller_word_seperator_length(speller, char *)
It returns the number of bytes till the next word character, using the aspell
internal function !lang_->is_alpha().
int aspell_speller_word_length(speller, char *)
It returns the number of bytes till the next non-word character, using
lang_->is_alpha() as well.
aspell_speller_add_lower_to_personal()
This adds a lowercased version of the given string to the personal word list.
I guess you implemented this for capitalized words at sentence starts?
The problem I see with this approach is that you modified aspell internal
functions. But since I want to use aspell as a library, such modifications
are ruled out.
While looking through the code I found that aspell implements a Tokenizer
class, which seems to be designed to do the same. It is not exported, but
it is used by the DocumentChecker class. Maybe I should try to use that?
But its documentation in aspell.h is confusing (besides being misspelled :):
/* process a string
* The string passed in should only be split on white space
* characters. Furthermore, between calles to reset, each string
* should be passed in exactly once and in the order they appeared
* in the document. Passing in stings out of order, skipping
* strings or passing them in more than once may lead to undefined
* results. */
void aspell_document_checker_process(struct AspellDocumentChecker * ths, const
char * str, int size)
Does it mean I have to split my string to be checked at white space before
passing in the pieces to this function? Or does it mean that this function
usually only splits at white space?
Kindest regards,
Michael