aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Aspell-user] special chars getting counted as a word


From: Aaron Miller
Subject: [Aspell-user] special chars getting counted as a word
Date: Sun, 12 Aug 2007 21:46:03 -0700

Hello,

I am playing around with aspell as a server side spell checker for a
flash application. It works beautifully (and fast as hell too!), but I
did notice one little oddity that I haven't been able to find an
explanation for in the docs.

The problem happens when there is a special character in the text. I
am not sure all of the special characters that cause my word counting
algorithm to fail, but here is an example of the one that caused
breakage (one of those long dashes that was in some text copied from a
wiki).

http://labs.splashlabs.com/spellcheck/1186978249

When I pipe the above file through aspell (en_US), i get back the result:

aspell -a < 1186978249
@(#) International Ispell Version 3.1.20 (but really Aspell 0.50.5)
*
*
& geniral 5 10: general, genital, genial, generally, generals
*

So it appears to count the lone character as a word. In my own
program, I have to count words to find the start and end char indexes
of the incorrect word. Since my algorithm does not count it as a word,
my word count becomes off.

Are there any options I can pass to prevent it from being counted? Or
is there a way to figure out what all is counted as a word so I can
match my own regex to it?

Thanks for any advice!
...aaron


-- 
Aaron Miller
Chief Technology Officer
Splash Labs, LLC.
address@hidden  |  206-328-5485
http://www.splashlabs.com




reply via email to

[Prev in Thread] Current Thread [Next in Thread]