[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Tetum-translators] Re: Tetum wordlist "official"
From: |
Peter Gossner |
Subject: |
[Tetum-translators] Re: Tetum wordlist "official" |
Date: |
Tue, 25 May 2004 00:13:44 +0930 |
Wow Hi Kevin !
Mate that is really impressive... FanBloodyTastic
'Scuse the strange mail format this seemed easier at the time...
<quote who=KevinPS>
Pete,
Great news -- your word list was more than sufficient
and the crawler bootstrapped without a hitch -- it's grabbed
more than 400,000 words of Tetum already.
Here are the docs after just 24 hours online:
http://borel.slu.edu/tet.html
Perhaps, since you already have this clean list of
5000+ words, I should just run the corpus through
my filters and generate some candidate words for
you to look over -- by the looks of things, without
too much effort on your part we could easily have
a working aspell-tet package sometime this week!
</quote>
<reply who=pete>
Have everything ready to go from ASPELL CVS here... (I think)
The "other Kevin" (Aspell Kevin) is building a demo from the original
5000 list:) I think I get it. Some of the soudslike mappings are going
to be ... interesting. (tetum with an Au. accent ...lol nah I wont )
I also now remember why I never bothered to learn C++ :)
(how slow is the compiler !)
</reply>
<quote who=KevinPS>
Let me know and I'll fire off some lists to you.
I guess since you aren't a native speaker this
could be a bit harder than usual (i.e. maybe you'll
need to consult dictionary occasionally) but the filters
and the frequency counts I'll give you will make things
much easier.
</quote>
<reply who=pete>
Sounds really good.
Please fire away... what else do I need ?
Um 400 thousand might be a good place to STOP :)
LOL.
</reply>
<quote who=KevinPS>
-Kevin
PS if you see any non-Tetum docs on the page above,
or repeats, please let me know since they'll skew the
stats a bit...
</quote>
<reply who=pete>
Looks very clean to me. I will have another look tomorrow.
I thought some may be Indonesian or Portuguese but they seem pretty
clean.
Mate that's VERY impressive. I guess some may be duplicate
content (the wiki entry seems to be everywhere for instance.. though not
on your list which is also impressive... ) The code is Open Source?
The original 5000 was fairly carefully gathered from "reputable /
official sources" , looks like the few random pages I sampled where
mostly pure "Dili -Tetum".. some Indonesian Borrow words but that is
real life. That is that is how the real timorese use the language. So
for a GP dictionary... Excellent !
I wish my tetum were that good but to me it seems great.
I will CC some real tetum speakers and beta test the aspell dictionary
(of course) with them and some written (paper) references, as well.
Kevin this is unbelievably great news ! I thought the 5000 was good :> !
Thankyou Thank you.
Pete
</reply>
--
Todays fortune:
Today is the first day of the rest of your life.
< http://www.gnu.org/software/tetum/ >
< http://bigbutton.com.au/~gossner >
< address@hidden >