silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [silpa-discuss] Changes made as part of GSoC - Call for Suggestions


From: Balasankar C
Subject: Re: [silpa-discuss] Changes made as part of GSoC - Call for Suggestions
Date: Thu, 1 Sep 2016 10:39:19 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.2.0

Hi,

[Don't CC me. I am subscribed to the list]

>
> I suggest that unless we're storing state, the object oriented approach is an
> overkill. chardetails should ideally be a function, that too in some
> libindic.util library.

I agree with the concept of a common utility module. I am not sure about
chardetails being a part of that. TBH, chardetails is a module I use heavily,
especially due to the dual encoding of Chillus. :D

>
> In my opinion, something like stemmer(lang='ml') or stemmer.ml
> <http://stemmer.ml> is shorter, but still comprehensible.

I initially thought of that. But at that time, I felt that it involves stuffing
too much code to a single file and another set of conditionals. Especially if
the code is different for different languages (I am not sure if it will be, I
know only about Malayalam).

>
> I also propose that the modules be moved to python3, break backwards
> compatibility and libindic be restructured.

A big plus one for that. We should drop Python 2 for good.

>
> Let me elaborate a little on restructuring. For example, spellchecker has its
> own implementation of the levenshtein. We could have this moved to a single
> util file, since its a metric commonly used and possible redundancy in 
> multiple
> modules.

Yes. a common util module is good. We can discuss what to include.

>
> Alternatively, we can avoid a lot of headaches/reinventing the wheel by using
> existing functions in nltk. nltk already has implementations of basic NLP
> functions like the distance metrics, ngrams. This lets us avoid trouble of
> having to create and maintain documentation. Also, makes it easier for new
> contributors with experience in the library.

We have to make sure they work with Indic languages too.



-- 
Balasankar C
http://balasankarc.in



reply via email to

[Prev in Thread] Current Thread [Next in Thread]