[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Aspell-user] Looking for usage advice
From: |
Christoph Hintermüller |
Subject: |
Re: [Aspell-user] Looking for usage advice |
Date: |
Sun, 30 Jan 2005 23:02:43 +0100 |
User-agent: |
KMail/1.6.2 |
Am Sonntag, 30. Januar 2005 20:31 schrieb Grzegorz Adam Hankiewicz:
First question which aspell version do you use <= 0.33 0.50.X or even 0.60.X?
[...]
> The book is written in XML. This is not a problem for aspell and its
> html/sgml mode. Each chapter is stored in a separate file, and so far
> translators have gone individually through each file. When the file
> has gone through an initial translation, we fire up aspell on it.
>
> Since the text contains lots of technical words not included in the
> default dictionaries, and also sometimes text in English which is
> left verbatim, aspell reports many false positives which have to
> be ignored.
Are the english citations at least in the file spelled delimited by some
recognizeable dleimiters (xml tags) like
<en> this is some english text </en>
or
"[...] this is some english text [...]"
or other. If the pairs of delimiters are unique and the texts of different
languages do not overlapp aspell 0.60.X could help you to ease the spelling
process. As there exists a context filter which allows to separate two
different contexts of a text. One visible and one invisible as long as both
are separated by at least one pair of delimiters. In this case a two pass
spell check with initial context visible for spelling spanisch text and
initial context invisible to aspell for spelling english text would do the
trick. In case you collect all the settings for both passes in the mode files
spell-en and spell-es including proper selection of englisch ans spanish
dictionary, than the following calls would do the trick.
aspell --mode spell-es -c <file-to-spell>
aspell --mode spell-en -c <file-to-spell>
If simple context switching is not suitable but there still exists a set of
rules how to distinguish between text parts to spell with english, text parts
to spell using spanish dictionary an parts to spell not at all for 0.60.X you
can code your own text filter to do the job.
>
> Once the translator has gone through the XML file with aspell, we
> create an "ignore" file from the spell checked document. This ignore
> file is created with the list command and piped to a hidden file. On
> posterior aspell runs, this hidden file is converted into a custom
> dictionary (with "create master") and added on the commandline.
>
Again in 0.60.X you could collect all your settings in a mode file and call
aspell similar to the above examples.
[...]
> Possibly the best improvement we could find is if aspell was able
> to recognise different languages in the document being scanned.
> Reading the mailing list archives I've found out that this feature is
> not planned due to the intrinsic difficulty of detecting correctly
> a language. However, in the kind of documents we translate
> usually english text is left alone in specific tags, like <screen>
> or <quote>.
See above. The following commandline examples give a hint uppon what could be
the content of mode files spell-en and spell-es, i do not mention the
parameters for the ignore file as this has to be splittet into es-ignore and
en ignore file. the '\' char in the following only denotes that the line is
continued do not add it literally
aspell --add-filter context --add-context-delimiters "<screen> </screen>" \
--add-context-delimiters "<quote> </quote>" --language-tag en \
--dont-context-initial-visible -c <file-to-spell>
aspell --add-filter context --add-context-delimiters "<screen> </screen>" \
--add-context-delimiters "<quote> </quote>" --language-tag=es \
--context-initial-visible -c <file-to-spell>
Sadly this only works for Aspell 0.60.X . Further i do not know exactly if
context filter is run prior to xml filter or if it is part of the xml mode
afaik it is. Therefore i didn't add any --rem-all-context-delimiters to the
above lines.
>
> Possibly heresy in itself, it could be useful if aspell had a basic
> XML scanner and was more aware of the format of the document it is
> parsing, providing user customised hooks whenever specific tags are
> found. By basic I mean really dump word matching: if the tag <quote
> is found and the user specified this as a hook, aspell could maybe
> change to another dictionary on the fly, prompt the user whether
> this change is OK (showing it at the same time on the screen for the
> user to judge), maybe pipe the bit of text to yet another program,
> etc, until the byte sequence </quote> had been found.
>
Why there is not only xml there are other text file formats too. Thus i think
it would be better to add a multi context filter to aspell. This filter
should be capable of not only distinguishing visible and invisible but aso
handle a visible context within an invisble and a invisble within a visible
one.
[...]
As a final question are you able to change aspell to 0.60.X if you not using
it allreadyl.
cu
Xris