aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[aspell-devel] Hyphenation and compound words


From: Lars Aronsson
Subject: [aspell-devel] Hyphenation and compound words
Date: Sun, 11 Apr 2004 22:43:31 +0200 (CEST)

Maybe the key to compund words is in the hyphen.

Languages with many compound words also frequently use a special form
of hyphenation.  In English, one example of this case would be "wooden
and brick buildings", which in German is written "Holz- und
Backsteingebäude". Note that this hyphen has nothing to do with line
breaks.  This kind of hyphen most often appears before "and" or "or",
or (in lists) before the comma (Holz-, Backstein- und Stahlgebäude).
(There are a few more cases, e.g. "wooden rather than brick
buildings", where "rather than" takes the place of "and".)

In compound words where a glue letter is used, the glue letter appears
before the hyphen.  In compound words where a special form of the
first word is used, this special form appears before the hyphen even
though it would not be allowed as a word by it self.

In Swedish, a typicaly nound (such as "girl") has eight different
forms:

  flicka       - singular, nominative, indefinite   =     girl
  flickan      - singular, nominative, definite     = the girl
  flickas      - singular, genitive, indefinite     =     girl's
  flickans     - singular, genitive, definite       = the girl's
  flickor      - plural, nominative, indefinite     =     girls
  flickorna    - plural, nominative, definite       = the girls
  flickors     - plural, genitive, indefinie        =     girls'
  flickornas   - plural, genitive, definite         = the girls'

Added to this, however, is the form used in compound words: flick-
e.g. flick-cykel (girl's bicycle), flick-aktig (girl-ish).  A shop can
advertise new models of "flick- och pojkcyklar" (girls' and boys'
bicycles).

To cover Swedish (and Danish and Norwegian, and probably German), it
would be sufficient to distinguish the hyphenated form (flick-) as a
legal word of its own and the only legal prefix for compound words.
Thus, the typical noun would have nine different forms rather than
eight.

Just like English "sheep" (plural: sheep), there are many words in
these languages where some of the nine forms coincide.  In many cases,
the hyphenated form coincide with the singular-nominative-indefinite
form (often called "the basic form").  Still, when adding a word to
the dictionary, a form with nine fields would be generally applicable
for Swedish, Danish, and Norwegian.

In the old aspell dictionary format (all words listed), it would
suffice to list the nine forms:

  flicka
  flickas
  flickan
  flickans
  flickor
  flickors
  flickorna
  flickornas
  flick-

and the "flick-" form could be freely used as a prefix in compound
words.  Any word could be used as a suffix in compund words.

Could we have this support for "dictionary words ending in hyphen"
implemented?  It would be a great help to designing better
dictionaries for these languages.


-- 
  Lars Aronsson (address@hidden)
  Aronsson Datateknik - http://aronsson.se/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]