aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...


From: Martin Swift
Subject: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...
Date: Thu, 22 Feb 2007 00:40:43 +0900

Dear aspell community,

For a while, now, I've been unable to properly install the Icelandic and German aspell libraries. I get a torrent of warnings similar to:
  Warning: The string "Grímseyjar" is invalid. Invalid UTF-8 sequence at position 3. Skipping string.
  Warning: The string "Grímsson" is invalid. Invalid UTF-8 sequence at position 3. Skipping string.
  Warning: The string "Grímssonar" is invalid. Invalid UTF-8 sequence at position 3. Skipping string.
  Warning: The string "Grímssyni" is invalid. Invalid UTF-8 sequence at position 3. Skipping string.

It seems that the problem arises in every string that contains a non-ascii character. For languages such as those mentioned which contain a lot of such characters, the resulting library is pretty much useless.

A while ago, I tried everything I could think of. Grabbed the source files and converted some in different directions with iconv but without luck. Eventually, I gave up figuring that my spelling wasn't *that* bad. Some time later, I've found nothing on this and the Gentoo community doesn't seem to recognize where the issue may lie.

The archives for this list also seem rather silent so I wanted to see if anyone had any advice to offer as to what I could try to do to fix the problem, or at least diagnose it. If not, well then at least it's here for the next one looking...

Thanks,
Martin

$ aspell --version
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.4)

$ locale
LANG=is_IS.utf8
LC_CTYPE="is_IS.utf8"
LC_NUMERIC="is_IS.utf8"
LC_TIME="is_IS.utf8"
LC_COLLATE="is_IS.utf8"
LC_MONETARY="is_IS.utf8"
LC_MESSAGES=en_GB.utf8
LC_PAPER="is_IS.utf8"
LC_NAME="is_IS.utf8"
LC_ADDRESS="is_IS.utf8"
LC_TELEPHONE="is_IS.utf8"
LC_MEASUREMENT="is_IS.utf8"
LC_IDENTIFICATION="is_IS.utf8"
LC_ALL=

$ grep UTF /etc/locale.gen
en_GB.UTF-8 UTF-8
en_CA.UTF-8 UTF-8
en_US.UTF-8 UTF-8
is_IS.UTF-8 UTF-8
de_DE.UTF-8 UTF-8
ja_JP.UTF-8 UTF-8

reply via email to

[Prev in Thread] Current Thread [Next in Thread]