silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [silpa-discuss] OSM Transliterator using SILPA


From: Vasudev Kamath
Subject: Re: [silpa-discuss] OSM Transliterator using SILPA
Date: Mon, 19 Oct 2015 22:32:16 +0530
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

Hey,

Please use our list for discussion so if I don't reply some one else
will :-).

Aruna Sankaranarayanan <address@hidden> writes:

> Hey,

Sorry for the delayed reply was busy with other works.

>
> I am trying to create an automatic transliterator for OSM using
> SILPA. You probably know that if there's a proper noun like
> "Aurobindo" or "Vidhana Soudha" written in English, it won't get
> transliterated because it's not present in the CMU dictionary used by
> SILPA.

Glad to know some one is considering to use it. Its been while with no
real user base there by reducing motivation for us to work on it.

Now coming to words you mentioned above they are not really English
words I agree Aurobindo is noun but its Indian originated but Vidhana
Soudha is clearly a Kannada word. I know about this problem and we
considered a method to fix this. After bit of digging old mails I think
I recall it now.

1. Convert English text to IPA (International Phonetic Alphabet)
2. There is rule for conversion from IPA to Indian language, but might
   need some touch up. IIRC I was planning to do some fixing but I don't
   remember it now. But reading through code should give you some idea.

So for eg: Apple from English becomes -> æpəl  and this can be converted
to any Indian language correctly.

But if you consider your mentioned 2 words it won't transliterate to IPA
this is mainly because their are in pronounceable for but without IPA
characters.

> One of the things I am interested in doing is trying to extend the CMU
> dictionary for OSM. This would involve forking
> libindic/Transliteration and then adding unique proper nouns from OSM
> to the dictionary along with the pronounciation. Do you think there's
> an easy/alternative/better way to do this?

Look at my above proposal. Also I remember there was issue in English to
IPA module which I filed here ¹ and some one gave us PR as part of Gsoc
14 but probably I missed it ² So if you consider working on this please
review PR and see if its working as expected if so let us know and we
will merge it so you can work on further part.

¹ https://github.com/libindic/Transliteration/issues/3
² https://github.com/libindic/Transliteration/pull/6

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]