varnamproject-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Varnamproject-discuss] Update


From: Navaneeth K N
Subject: Re: [Varnamproject-discuss] Update
Date: Tue, 08 Jul 2014 11:00:59 +0530
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Sounds good...

On 7/7/14 11:51 PM, Kevin Martin wrote:
> I'll put this in the blog soon. Just a quick review of what I did
> :
> 
> 1) Now looks ahead one syllable before stemming. Solved a lot of
> problems with the stem rule parsing. This is done by creating an
> exceptions table in ml.vst. Each row contains a stem rule and its
> exception. The stem rule is not applied if the syllable preceding
> the suffix is the exception.
> 
> 2) Will not stem very short words, and will not stem if the length
> of the word - the length of the suffix is less than 2 syllables
> long.
> 
> 3) Created a data set of 1000 words from malayalam wikipedia
> history section and tested stemming on it. Stems with 94% accuracy
> (In cases where the word is already in base form, not stemming the
> word is considered as a correct result).
> 
> Within the next few days :
> 
> 1) More data sets. Approximately 5000 words in total for testing 2)
> Tweak stem rules even more. 3) Finally test the learning.
> 

- -- 
Cheers,
Navaneeth
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJTu4ITAAoJEHFACYSL7h6kjqEH/1VlMAX0loSaug5Eh+FjVfmu
54xV203KT/6k3nqaT0feGBr1xUsymxkJrjb6iFXbhd6ygjU+0DWJLfC/Vhjk0rq9
4Pb0nGegnIE37eEjpVrGJlrFq65SWs1NSNdyqUB1puYgDYyUF2KsOLzbRCo+rjp2
Gj+/mV27IbxTRV2SfB1twSDQFraQoQbA73TZ0xc8pMN5c3FLIuydNB0ZV7BL/uMv
zvEdlzRh/AI9Oz2WZpNiTjWNw9je0i2CR5392lTB14mCotcIyh1CIhj8dOK386wS
ETKC6O2FJkQie8kWBF9V3opsUvPH5Z4g1veb7FwMJc5dKM9hWjLZnwRAlTIN3fI=
=wUax
-----END PGP SIGNATURE-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]