|
From: | Kevin Atkinson |
Subject: | Re: [Aspell-user] small bug: two following non alpha characters |
Date: | Tue, 1 Nov 2005 19:03:38 -0700 (MST) |
On Wed, 26 Oct 2005, Gary Setter wrote:
Back in August I was trying to make my program working with Unicode and the koi8-r character set. One of the problems was tokenizing the text into words. It seemed aspell was treating all character sets as ASCII.
Could you more specific.
The speller object does have a language member and the language member does have a sense of the characteristics of each character in the characterset. What are the characteristics of the ampersand and dash in your characterset? Might aspell make use those characterset specific characteristics to tokenize "hait-l'-ovraedje" as one word?
Yes it might make sense but I do not have support for it. The ' and - are treated as special characters. They can only be part of a word if they have have normal letters on both sides otherwise thinks "like--this" will also be treated as a single word. For this reason special care needs to be taken for treating these special characters.
[Prev in Thread] | Current Thread | [Next in Thread] |