Re: [Aspell-user] What letters belongs to a word? (issu e with Turkish "

aspell-user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] What letters belongs to a word? (issu e with Turkish "

From:	Kevin Atkinson
Subject:	Re: [Aspell-user] What letters belongs to a word? (issu e with Turkish "ı")
Date:	Fri, 9 Sep 2011 19:46:58 -0600 (MDT)
User-agent:	Alpine 2.00 (BSF 1167 2008-08-23)

On Fri, 9 Sep 2011, Daniel wrote:

Agustin Martin <agustin.martin <at> hispalinux.es> writes:

On Wed, Sep 07, 2011 at 09:47:01AM +0000, Daniel wrote:

When I run my aspell (through emacs), the Turkish letter "ı" is not
considered to be part of words; I'm given suggestions "Bostanc",
when I actually wrote "Bastancı". I am given no suggestion for
"İstanbulı", since "ı" actually _is_ recognized as part of the
word... How can I tell aspell to include "ı" in words?


Which aspell and emacs version are you using?


This is emacs 23.3.1 and aspell 0.60.6.1.

To clarify, it is the Turkish "lowercase dotless i" that Aspell doesn't
recognize as part of a word. This is when I run aspell on my text using
an English dictionary (from aspell-en package). From within Emacs, or
with Aspell alone. I do pass --encoding=UTF-8, but it doesn't seem
necessary (it detects my locale, right).

But when I try with a Turkish dictionary, it does work. Then the
dotless-i is indeed part of the word. Probably because the tr.dat has
"charset iso8859-9", while en.dat has "charset iso8859-1". I didn't look
in the dat-files before. But this is a bit silly; I am using UTF-8!
Is this artifacts of old non-utf8 Aspell; it still needs to tie a
(narrow) character set to the dictionary it is spelling with?

More or less. Please seehttp://aspell.net/man-html/Notes-on-8_002dbit-Characters.html. That beingsaid the problem you are facing is not just because the dictionary is8-bit, but also because I convert the document to the 8-bit encodingbefore I tokenize it. The latter is something I plan to eventually fix.If you really want to be able to recognize Turkish words when using theEnglish dictionary than you can try the attached special character set.Unzip the contents in `aspell config data-dir` then change "charsetiso8859-1" to "charset iso8859-1-u" in en.dat.

However, even if Aspell did recognize the word correctly it would beunlikely to do what you want when using the English dictionary becausespecial rules are needed to handle the Turkish ı when changing case.

iso-8859-1-u.zip
Description: Zip archive

[Prev in Thread]

Current Thread

[Next in Thread]

[Aspell-user] What letters belongs to a word? (issu e with Turkish "ı"), Daniel, 2011/09/07
- Re: [Aspell-user] What letters belongs to a word? (issue with Turkish "??"), Agustin Martin, 2011/09/08
  - Re: [Aspell-user] What letters belongs to a word? (issue with Turkish "??"), Daniel, 2011/09/09
    - Re: [Aspell-user] What letters belongs to a word? (issu e with Turkish "ı"), Kevin Atkinson <=
    - Re: [Aspell-user] What letters belongs to a word? (issu e with Turkish"ı"), Daniel, 2011/09/10
    - Re: [Aspell-user] What letters belongs to a word? (issu e with Turkish"ı"), Kevin Atkinson, 2011/09/10

Prev by Date: Re: [Aspell-user] What letters belongs to a word? (issue with Turkish "??")
Next by Date: [Aspell-user] New User
Previous by thread: Re: [Aspell-user] What letters belongs to a word? (issue with Turkish "??")
Next by thread: Re: [Aspell-user] What letters belongs to a word? (issu e with Turkish"ı")
Index(es):
- Date
- Thread