[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Varnamproject-discuss] Update
From: |
Navaneeth K N |
Subject: |
Re: [Varnamproject-discuss] Update |
Date: |
Tue, 08 Jul 2014 11:00:59 +0530 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Sounds good...
On 7/7/14 11:51 PM, Kevin Martin wrote:
> I'll put this in the blog soon. Just a quick review of what I did
> :
>
> 1) Now looks ahead one syllable before stemming. Solved a lot of
> problems with the stem rule parsing. This is done by creating an
> exceptions table in ml.vst. Each row contains a stem rule and its
> exception. The stem rule is not applied if the syllable preceding
> the suffix is the exception.
>
> 2) Will not stem very short words, and will not stem if the length
> of the word - the length of the suffix is less than 2 syllables
> long.
>
> 3) Created a data set of 1000 words from malayalam wikipedia
> history section and tested stemming on it. Stems with 94% accuracy
> (In cases where the word is already in base form, not stemming the
> word is considered as a correct result).
>
> Within the next few days :
>
> 1) More data sets. Approximately 5000 words in total for testing 2)
> Tweak stem rules even more. 3) Finally test the learning.
>
- --
Cheers,
Navaneeth
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - https://gpgtools.org
iQEcBAEBCgAGBQJTu4ITAAoJEHFACYSL7h6kjqEH/1VlMAX0loSaug5Eh+FjVfmu
54xV203KT/6k3nqaT0feGBr1xUsymxkJrjb6iFXbhd6ygjU+0DWJLfC/Vhjk0rq9
4Pb0nGegnIE37eEjpVrGJlrFq65SWs1NSNdyqUB1puYgDYyUF2KsOLzbRCo+rjp2
Gj+/mV27IbxTRV2SfB1twSDQFraQoQbA73TZ0xc8pMN5c3FLIuydNB0ZV7BL/uMv
zvEdlzRh/AI9Oz2WZpNiTjWNw9je0i2CR5392lTB14mCotcIyh1CIhj8dOK386wS
ETKC6O2FJkQie8kWBF9V3opsUvPH5Z4g1veb7FwMJc5dKM9hWjLZnwRAlTIN3fI=
=wUax
-----END PGP SIGNATURE-----