aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...


From: Martin Swift
Subject: Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...
Date: Tue, 6 Mar 2007 13:21:12 +0900
User-agent: Mutt/1.5.13 (2006-08-11)

Sorry for the late reply, I wrote this a while ago, but left the draft
to get a friend's oppinion on the first bit.

On Sat, Mar 03, 2007 at 06:30:59AM -0700, Kevin Atkinson wrote:
> On Sat, 3 Mar 2007, Martin Swift wrote:
> 
> >On Sat, Mar 03, 2007 at 04:29:15AM -0700, Kevin Atkinson wrote:
> >>The word list is likely in iso-8859-1 but Aspell expects it in utf-8.
> 
> >Does this mean that aspell expects the word lists to have the same
> >charset as the machine? Isn't that a little odd?
> 
> I don't understand the question.

The question refers to the bit quoted from the website, not the bit
that is quoted here at the top. Without proper context (that it is
referring to 'data-encoding') that passage is hard to decipher. Not
finding sense in the paragraph's first sentence, I focused harder on
it rather than read around it. Fortunately, I was shown the error of
my ways and things are finally clearing up.

> >de.dat sets 'charset' as iso-8859-1:
> >
> > # cat de.dat
> > # Generated with Aspell Dicts "proc" script version 0.50.1
> > name de
> > charset iso-8859-1
> > soundslike de
> > affix      de
> >
> >Does aspell not use this to determine the charset? If not, /shouldn't/
> >it?
> 
> Yes it should.

OK, now we're cooking. So, the de.dat declares the word list files to
be in iso-8859-1 and this should direct aspell to treat them as such.
Good.

> >I just tried
> >
> > /usr/bin/prezip-bin -d < de-common.cwl | /usr/bin/aspell --lang=de create 
> > --encoding=iso8859-1 master ./de-common.rws
> 
> Something is wrong.   The "--encoding=iso8859-1" should not be necessary. 
> It should be using the value from "charset" in "de.dat".  Try setting your 
> locale to "C" and see if it makes a difference.

Tried both LC_ALL="C" and "POSIX" but no go. Should I try to generate
an ISO 8859-1 character encodings and try with it set? Do you have any
other suggestions?

> >A couple of questions:
> >
> > Is this going to conflict with my machines character encoding, or
> >has aspell created an rws file for a utf-8 system?
> 
> No.  Aspell will convert between encoding as necessary.

Great!

> > Is the machine character encoding check a feature? It really seems
> >that since one might attemp to install the same wordlist on machines
> >with different character encodings that this is prone to failure.
> 
> Hu?

:-) Nevermind. Stems from confusion about that paragraph in the users
manual.

Cheers,
Martin

-- 
\u270C




reply via email to

[Prev in Thread] Current Thread [Next in Thread]