[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [silpa-discuss] Kannada -English Dictionary bot launched
From: |
Santhosh Thottingal |
Subject: |
Re: [silpa-discuss] Kannada -English Dictionary bot launched |
Date: |
Mon, 22 Nov 2010 22:39:29 -0800 |
User-agent: |
Zoho Mail |
---- On Mon, 22 Nov 2010 19:55:18 -0800 Praveen A wrote ----
>Can we think of a plugin to mediawiki that will respond in dict
>protocol? Mediawiki knows well about the content and instead of trying
>to make sense at our end it would be better to send better response in
>the first place. What do you think?
It cannot be a plugin. As per DICT protocol, the dict server is a server
listening on port 2658(can be configured) and responds to the requests.
The wiktionary already provides an API for accessing the data.
Try this URL in your browser
http://ml.wiktionary.org/w/api.php?action=parse&format=xml&prop=text|revid|displaytitle&callback=?&page=gold
This is an API which gives the meaning of word 'gold' in xml format. Look at
the xml data and see how bad it is.
And look at this English wiktionary xml api
http://en.wiktionary.org/w/api.php?action=parse&format=xml&prop=text|revid|displaytitle&callback=?&page=gold
http://ml.wiktionary.org/w/api.php?action=parse&format=json&prop=text|revid|displaytitle&callback=?&page=gold
will output in json format. But again the data is in bad shape and difficult
to parse.
What required is correcting this API. Instead of giving html markup in the API
result, we need structured data. But that is not easy until wiktionary defines
the meaning in structured manner instead of allowing users to add data in any
format. It more resembles to a wiki page which talks about the meaning of the
word. This need to be redesigned if we need the large amount of valuable data
present in wiktionaries suitable for machine consumption.
Since there are wiktionaries in all languages, building a cross language
lexicon is a great thing we can derive from this large repository of words. It
can be in open lexicon standards like http://www.olif.net/ . If the data is POS
tagged, this is going to be valuable resource for anybody working with machine
translation or NLP related area.
-Santhosh