[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Emacs i18n
From: |
Juri Linkov |
Subject: |
Re: Emacs i18n |
Date: |
Thu, 21 Mar 2019 23:45:31 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu) |
> > Indeed, a complete implementation of all Russian morphological rules
> > takes ~1600 lines of dense Perl code:
>
> > http://www.linkov.net/files/nlp/Lingua-RU-Inflect.pm
>
> > I can't imagine how to include all these rules to gettext.
>
> I agree with you about that. What I propose is something else.
>
> 1. I do not propose implementing them all. Only some -- whichever ones
> we think are worth while.
>
> 2. I do not propose putting any of this in gettext.
> What I propose would be Emacs code that operates on the strings that
> come from gettext.
The misconception of your proposal is assuming a pure algorithmic
inflection whereas actually inflection in Russian is dictionary-based
(in addition to algorithms that process words from the dictionary),
i.e. to be able to inflect a word you need a large dictionary of all
words where each word in the dictionary has at least the following
lexical properties:
- part of speech
- noun grammatical gender: masculine, feminine, neuter
- noun animacy: animate, inanimate
- inflection type
And the main parameters that influence the declension are:
- grammatical case (one of 6 basic: nominative, genitive, dative,
accusative, instrumental, prepositional plus some additional)
- number: singular and plural. Dual is not a grammatical number,
it only influences the choice of cases for words after numerals:
for 1 - nominative case, singular
for 2..4 - genitive case, singular
for 5.. - genitive case, plural
An additional problem is that there are many exceptions:
some words have an additional form called "count form"
https://en.wikipedia.org/wiki/Russian_declension#Count_form
For instance, an exception is to use "5 байт" (5 byte) instead of
what should be according to the grammatical rule that requires
genitive plural for most other words, but not for bytes,
i.e. this is incorrect: "5 байтов" (5 bytes).
Such exceptions are marked in the dictionary with a special
property that has different values:
- mandatory: only the count form is allowed for such units
of measure as amperes, watts, volts, bits, bytes, etc.
- optional: both forms are accepted for such units as angstroms,
gauss, (kilo)grams, decibels, carats, microns, ohms, röntgen, etc.
Re: Emacs i18n, Bruno Haible, 2019/03/20
Re: Emacs i18n, Richard Stallman, 2019/03/20