aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[aspell-devel] How to Implement Better Support for Compound Words


From: Kevin Atkinson
Subject: [aspell-devel] How to Implement Better Support for Compound Words
Date: Mon, 26 Apr 2004 17:04:31 -0400 (EDT)

[Anyone who expressed an interest in compound support has been CC.]

Based on the feedback I have received so far, acceptable support for
compound words involved two basically independent parts.

If this is not sufficient please let me know.

PART ONE)

Describes how the word needs to be changed when forming a compound

CMP <flag> <strip> <add> <cond> <cond2>

<flag>  is the compound flag
<strip> is the string to strip or 0 for the null string
<add>   is the string to add or 0 for the null string
<cond>  is the condition to match at the end of the current word
<cond2> is the condition to match at the beginning of the next word

All but the last field are the same as a suffix entry in the exiting
affix code.

<cond> is a simplified regular expression.  Some examples:
  . (for anything)
  e
  [^aeiou]y
  [^ey]
  [aeiou]y

Question: Is it ever the case that the beginning of a word needs to be
changed when forming compounds?

PART TWO)

Describes the position a word can appear in (beginning, middle, or
end) and with which words.

To do this each word can be assigned a category.  Then each category
can be given a set of rules to describe how it can be used in a
compound word for example

  A + B: indicates that category A may appear at beginning of a
    word when followed by a category B word.  When combined it is then
    considered a category B word.
  A + C + B: here a C word may only appear between an A or B word
  A + A + B
  A + A
  A + A + A
  etc..

I have not decided if a word should be allowed to belong to more than
one category as a new category can be created in necessary to mean
words in both category A and B for example.

TO IMPLEMENT:

*) expand the affix code to support special compound flags as
   described in part one

*) write code to store the conditions as described in part two

*) expand the compound checking code to check against the conditions

*) expand the dictionary format to store the necessary compound info
   with the word

If anyone would like to try implementing this let me know.  If you can
get it done by the end of June or so it will probably make it into
Aspell 0.60.

--- 
http://kevin.atkinson.dhs.org







reply via email to

[Prev in Thread] Current Thread [Next in Thread]