[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-register-public] [task #7015] Submission of Enhanced Brill's P
From: |
Golam Mortuza Hossain |
Subject: |
[Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger |
Date: |
Sun, 17 Jun 2007 15:00:19 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty) |
URL:
<http://savannah.nongnu.org/task/?7015>
Summary: Submission of Enhanced Brill's Parts-of-Speech
Tagger
Project: Savannah Administration
Submitted by: golam
Submitted on: Sunday 06/17/2007 at 15:00
Should Start On: Sunday 06/17/2007 at 00:00
Should be Finished on: Wednesday 06/27/2007 at 00:00
Category: Project Approval
Priority: 5 - Normal
Status: None
Privacy: Public
Percent Complete: 0%
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Effort: 0.00
_______________________________________________________
Details:
A new project has been registered at Savannah
This project account will remain inactive until a site admin approves or
discards the registration.
= Registration Administration =
While this item will be useful to track the registration process, *approving
or discarding the registration must be done using the specific Group
Administration
<https://savannah.nongnu.org/siteadmin/groupedit.php?group_id=9342> page*,
accessible only to site administrators, effectively *logged as site
administrators* (superuser):
* Group Administration
<https://savannah.nongnu.org/siteadmin/groupedit.php?group_id=9342>
= Registration Details =
* Name: *Enhanced Brill's Parts-of-Speech Tagger*
* System Name: *gposttl*
* Type: non-GNU software & documentation
* License: Other (The following licence which is GPL-compatible, applies to
the part originally written by Eric Brill. This part is
marked with copyright notices. The rest of the program is
licensed under GPL v2, see further down.
______________________________________________________________________
License for the part of the program written by Eric Brill
______________________________________________________________________
This software was written by Eric Brill.
This software is being provided to you, the LICENSEE, by the
Massachusetts Institute of Technology (M.I.T.) under the following
license. By obtaining, using and/or copying this software, you agree
that you have read, understood, and will comply with these terms and
conditions:
Permission to [use, copy, modify and distribute, including the right to
grant others rights to distribute at any tier, this software and its
documentation for any purpose and without fee or royalty] is hereby
granted, provided that you agree to comply with the following copyright
notice and statements, including the disclaimer, and that the same
appear on ALL copies of the software and documentation, including
modifications that you make for internal use or for distribution:
Copyright 1993 by the Massachusetts Institute of Technology and the
University of Pennsylvania. All rights reserved.
THIS SOFTWARE IS PROVIDED "AS IS", AND M.I.T. MAKES NO REPRESENTATIONS
OR WARRANTIES, EXPRESS OR IMPLIED. By way of example, but not
limitation, M.I.T. MAKES NO REPRESENTATIONS OR WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF
THE LICENSED SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY
PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
The name of the Massachusetts Institute of Technology or M.I.T. may NOT
be used in advertising or publicity pertaining to distribution of the
software. Title to copyright in this software and any associated
documentation shall at all times remain with M.I.T., and USER agrees to
preserve same.
______________________________________________________________________
License for the rest of the program
______________________________________________________________________
GNU GENERAL PUBLIC LICENSE
Version 2)
----
==== Description: ====
GPoSTTL
(Brill's Parts-of-Speech Tagger, with built-in Tokenizer and Lemmatizer)
GPoSTTL is an enhanced version of Brill's rule-based Parts-of-Speech Tagger
for English, with built-in Tokenizer and Lemmatizer. It reads from FILE or
STDIN and writes to STDOUT. It is based on LPost package by Jimmy Lin
(jimmylin at umd.edu). LPost itself is based on Benjamin Han's ePost package,
which is a cleaned-up version of Eric Brill's original code. The primary lemma
list was taken from e_lemma.txt (Ver.1), complied by Prof. Yasumasa Someya
(someya at someya-net.com), with permission. Later it has been and being
enhanced by hundreds of additional entries.
Motivations:
* GPoSTTL has been developed as a free software alternative for
TreeTagger [1], a non-free Penn Treebank tagger developed by Prof. Helmut
Schmid. GPoSTTL can be used as a drop-in substitute for TreeTagger.
As an explicit case, GPoSTTL is used as a crucial component of Anubadok[2], a
GPL'ed machine translator for English to Bengali.
The default mode of GPoSTTL uses enhanced Penn tagset to make its output
compatible with the output of TreeTagger. In particular, second letter of the
verb tags distinguishes between "be" verbs (B), "have" verbs (H) and other
verbs (V). The enhancement is done at last step of tagging procedure as its
lexicon contains the original Penn tagset.
GPoSTTL is written in C and the source code is available
from http://www.imsc.res.in/~golam/gposttl/
Ref:
[1]
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
[2] http://www.imsc.res.in/~golam/anubadok/
==== Other Software Required: ====
The program doesn't have any external dependency apart from
those available in a free operating system.
==== Other Comments: ====
The project is currently hosted in my personal webpage at IMSc[1].
Recently, I have left IMSc after completing my PhD. So the current
web space will expire withing few months.
[1] http://www.imsc.res.in/~golam/
_______________________________________________________
Reply to this item at:
<http://savannah.nongnu.org/task/?7015>
_______________________________________________
Message sent via/by Savannah
http://savannah.nongnu.org/
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger,
Golam Mortuza Hossain <=
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Michael Casadevall, 2007/06/21
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Golam Mortuza Hossain, 2007/06/21
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Michael Casadevall, 2007/06/22
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Golam Mortuza Hossain, 2007/06/22
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Sylvain Beucler, 2007/06/22
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Sylvain Beucler, 2007/06/22
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Michael Casadevall, 2007/06/22
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Golam Mortuza Hossain, 2007/06/22
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Michael Casadevall, 2007/06/23
- [Savannah-register-public] [task #7015] Submission of Enhanced Brill's Parts-of-Speech Tagger, Golam Mortuza Hossain, 2007/06/26