Re: [Tetum-translators] Fw: BlackDog-WhiteBoard

tetum-translators
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Tetum-translators] Fw: BlackDog-WhiteBoard

From:	Peter Gossner
Subject:	Re: [Tetum-translators] Fw: BlackDog-WhiteBoard
Date:	Thu, 18 Mar 2004 02:13:58 +1030
On Tue, 16 Mar 2004 16:14:04 -0800 (PST)  from a terminal far far away
<Lev/>  wrote:
>
>OK, I like this a lot. The flowchart is following the
>discussion thread and incorporating material
>appropriately. I particularly like the 'optional learn
>from selections' user choice.

I think that in the long term this will be the most powerfull "feature"
of the application.

>
>I think there are three further elements that have to
>be considered and incorporated.
>
>1) Rather than 'words' use 'phrases'. This seems to be
>the case with all machine translation literature that
>I have read. Gets around problems of definite
>articles, conjunctions of verbs etc.
>
OK. Will do.

>2) Between 5 and 7 (where's 6?) there needs to be a
>'grammar translation phase'. This may be one of the
>hardest elements.
>
Yep. Very hard. I have started a little study of this but will plead
"poor working knowledge" here. There is a "language" called malaga that
even has an emacs mode... (of course !)  

<quote from on board (huge docs)>
The Name ``Malaga'' has two different meanings: on the one hand, it is
the name of a special purpose programming language, namely a language to
implement grammars for natural languages. On the other hand, it is the
name of a program package for development of Malaga Grammars and testing
them by analysing words and sentences. 

``Malaga'' is an acronym for ``Merely a Left- Associative-Grammar
Application''.
</quote>
http://www.linguistik.uni-erlangen.de/~bjoern/malaga/

Now I am not yet sure how and
even if this would be appropriate but if seems to provide a
functionality we may (will) need. reading through the manuals for it I
have some concerns about how it does the grammatical mapping ... could 
run into exponential memory use issues... 

The concepts involved seem similar.. I will start a separate thread if
your interested. (personally I think You could come up with something
betterer :)

>3) I think it is important to specify what sort of
>interface or programming language we're using at each
>step. Just for my own sanity before I start learning
>Python, right?
LOL.. I deliberately did not do that.. I was trying to keep it
abstracted away from the dirty stuff. 

I suggest the following they all are accessible from python.
And all should (theoretically) be portable across OS's (even them !) and
Other Unixen.. 

User interface:
 pyGTK2 (and glade2 as a great quick tool.. It really
does rock)
I expect some GTK2 code will be required directly as well. (C with an
object approach.. as apposed to C++ which while attractive in itself is
a pig to debugg and has many exceptions to accommodate)

Databases:
Nice if we can keep this "pure" SQL and use say postgres or mysql as
well but Python has a few approaches we can use .. I am thinking of
using a toolkit called "gadfly".. um no idea why that name. there are
others though. Gadfly can operated in a fast C mode as well which is
attractive.. there are also python interfaces to postgres..

python2.3-gadfly - SQL database and parser generator for Python 2.3
python2.3-kjbuckets - Set and graph data types for Python 2.3 
(kjbuckets Goes with gadfly)
python2.3-gdbm - GNU dbm database support for Python (v2.3) 
(an alternative)

general Algorithum (the engine room)
Python2.3+ 

XML interfcaes (as required) :
PythonXML tools abound.
Gnomes (independant) linXML2 is VERY good (IMHO).. Though is C based we
should be able to use it from python without too much grief.

Logic engine: (if required)
CWM and N3 +  friends from W3C (all python)

DictServer:
Serpento (Python UTF8 version of dictd)
(Works fine here)

Spell Checker :
Aspell..
or our own (gulp)

Interprocess Communication:
e.g. for drag and drop or signals to another running app..
GTK (via pyGTK) and it look like gnomes D-bus may be a reasonable call
here as well if we need to otherwise there are the usual Xlib stuff ...
(GTK+ should have all we need ... D-Bus sounds promising... Not
something I have looked into in any great depth)

Any other protocols (e.g TCP/IP UDP/mailz) should all be available
directly from python or via the very very cool (though huge) "Twisted"
set of tools.

http://www.onlamp.com/pub/a/python/2004/01/15/twisted_intro.html
http://twisted.sourceforge.net/TwistedDocs-1.1.1/howto/index.xhtml
http://twisted.sourceforge.net/TwistedDocs-1.1.1/howto/enterprise.xhtml
(some database stuff there --hmm-- most things are possible reasonably
neatly in python :)


>
>Again I'd like to reiterate that this is extremely
>good work. I'm quite impressed.

Shucks and stuff :)

OK will do an updated flowchart and post (with step 6 :) ASAP (couple
days max)

If that is close I would then like to design an overview of the API
(AppProgInterface) with an idea to set out naming and coding appraoches 
at an early stage.. One "down side" with python is that almost anything
goes, I think it's really important to map out the namespaces and
possible modules, classes (scopes) etc. as early as possible...

I should be able to do this with DIA in layers so that it's not too
confusing.

First step is a real name for this thing :)
(blackdog is just a little obscure)

I am thinking Koalia but that is tetum specific...
How about Misty (Misty Is a Translator (for) You) and I like the song :)
or Spirit (aka Vodka) but also Syntax Parsing Interactive Reactive
Intellegence Translator...
)ok i will up my medication ( 

Catch ya
Pete

-- 
Todays fortune:
You will remember something that you should not have forgotten.
     
< http://www.gnu.org/software/tetum/ >
< http://bigbutton.com.au/~gossner >
< address@hidden >


>
>Regards,
>
>
>Lev
>
>
>
>
>--- Peter Gossner <address@hidden> wrote:
>> 
>> 
>> Forwarded message:
>> 
>> Date: Fri, 12 Mar 2004 02:02:07 +1030
>> From: Peter Gossner <address@hidden>
>> To: Lev <address@hidden>,
>> address@hidden
>> <address@hidden>,
>> address@hidden
>> <address@hidden>
>> Subject: BlackDog-WhiteBoard
>> 
>> 
>> Hi guys.
>> Attached should be a dia format whiteboard of a
>> revised approach
>> to the translator.
>> The idea is to chew twice and pass it on ... :)
>> 
>> The Design Objective is to capture as many errors as
>> practicable before
>> processing. The main change is putting the spell
>> checker in early and
>> whenever the user edits the document live.
>> 
>> The other is to identify unknown words from the
>> source language, (i.e.
>> the spell checker does not know them ) and known
>> words for which there
>> is no translation in the target language.
>> 
>> This should reduce errors and hits on the logic
>> engines / dictionaries
>> underneath everything.
>> It does mean that we need to build comprehensive
>> dictionaries  and
>> syntax "mappings).. but that just takes time and
>> hopefully usage.
>> 
>> Procedurally the process goes in two major steps. 
>> (which may/will/can
>> loop) Step 1 contains process steps 0 -4. (enter the
>> source doc to ask
>> for translation)
>> Step 2 may involve user interaction and essentially
>> presents the user
>> with a translation they may edit in the target
>> language. (or perhaps
>> return to the source language step .. not sure yet)
>> I imagine the entire doc is initially translated
>> with tagged(and/ or
>> colour coded) half tone xml tags wrapped
>> around"problem" words or
>> phrases. I would hope to have a right button popup
>> menu to select single
>> words or a larger window for phrases ... not sure
>> about the phrases
>> bit.. 
>> 
>> I have posted the dia with the engine room turned
>> off select "view
>> => layers" to play with that.
>> 
>> Other news is that I think we can have our own
>> database(s) without too
>> much trouble..(and not lose too much speed).This
>> should work for
>> standalone and local system wide services. May still
>> use postgres for
>> Project wide stuff but more on that later.(as an
>> active server thingy)
>> (security, speed and such)
>> 
>> 
>> now as for the learning environment I have only a
>> conceptual scope on
>> this but I can see a couple of ways of tackling it:
>> 1/ is embedding W3C cwm and N3 tools
>> 2/ is building our own 
>> 3/ most likely; rewriting the W3C stuff to suit us.
>> 
>> As for an editor I had a quick look at Abiword and
>> that IS possible ...
>> though I will need to do some extra study as there
>> is a fair bit of C++
>> there. The advantages are many though. Most flexible
>> choice.
>> 
>> Then there is Gedit. Which should be simpler to
>> implement and has built
>> in most things we need.
>> 
>> Another is to simply use the GTK2 text editor widget
>> and implement our
>> own interfaces to Aspell etc. (probably smallest
>> footprint and just as
>> functional as Gedit .. also leaves us free of update
>> issues by other
>> projects) I will do some more homework on this but
>> the demo i looked at
>> (with the developers tools) looks quite doable.
>> 
>> Finally at some stage I reckon this would mesh very
>> well with emacs
>> xemacs and the MULE stuff (at least). However this
>> involves most work
>> for our users. (though also eventually the most
>> power.. hey of course)
>> 
>> By the way. Sylpheed (this mail app) has a good
>> interface to spell
>> checking.. I think I have the source code for it
>> stashed away somewhere
>> as well...
>> 
>> (though I hate the editor ...)
>> 
>> One last really important note on the editor /
>> interface :
>> If we do this carefully it should be possible to
>> produce plugins for Abi
>> and Gedit in any case.. It would probably just
>> involve a separate pop up
>> window.. and would also (probably) use the tabs
>> capacity of gedit and
>> another full editor window(view) for abiword...i.e.
>> It is really
>> practical to keep the logic engines away from the
>> default interface.
>> (a few buttons as possible in normal mode)
>> 
>> 
>> I will start a separate thread for a development
>> plan and approaches..
>> and am keen to get more input. !
>> 
>> A "good Thing" about the version 0.0 approach is
>> that it abstracts away
>> the logic algorithms as well :) (so we can freely
>> play with them.)
>> 
>> 
>> Well hope this all makes some kind of sense.
>> 
>> Pete
>> -- 
>> Mesage Composed: Thu Mar 11 15:30:03 UTC 2004
>> Calendar events:
>> Mar 13       The Allman Brothers record their live album
>> at the Fillmore
>> East, 1971 
>> Mar 13       "Striptease" introduced, Paris, 1894
>> < http://www.gnu.org/software/tetum/ >
>> < http://bigbutton.com.au/~gossner >
>> < address@hidden >
>> 
>> 
>> 
>> 
>> 
>
>> ATTACHMENT part 2 application/octet-stream
>name=BlackDog-WhiteBoard-0.0.dia
>> _______________________________________________
>> Tetum-translators mailing list
>> address@hidden
>>
>http://mail.nongnu.org/mailman/listinfo/tetum-translators
>> 
>
>
>=====
>Lev Lafayette
>address@hidden
>http://au.geocities.com/lev_lafayette
>
>__________________________________
>Do you Yahoo!?
>Yahoo! Mail - More reliable, more storage, less spam
>http://mail.yahoo.com
>
>
>_______________________________________________
>Tetum-translators mailing list
>address@hidden
>http://mail.nongnu.org/mailman/listinfo/tetum-translators
[Prev in Thread]
Current Thread
[Next in Thread]
[Tetum-translators] Fw: BlackDog-WhiteBoard, Peter Gossner, 2004/03/12
- Re: [Tetum-translators] Fw: BlackDog-WhiteBoard, Lev Lafayette, 2004/03/16
  - Re: [Tetum-translators] Fw: BlackDog-WhiteBoard, Peter Gossner <=
    - SPRIT! Re: [Tetum-translators] Fw: BlackDog-WhiteBoard, Lev Lafayette, 2004/03/17
    - Re: SPRIT! Re: [Tetum-translators] Fw: BlackDog-WhiteBoard, Peter Gossner, 2004/03/18
Prev by Date: Re: [Tetum-translators] Fw: BlackDog-WhiteBoard
Next by Date: SPRIT! Re: [Tetum-translators] Fw: BlackDog-WhiteBoard
Previous by thread: Re: [Tetum-translators] Fw: BlackDog-WhiteBoard
Next by thread: SPRIT! Re: [Tetum-translators] Fw: BlackDog-WhiteBoard
Index(es):
- Date
- Thread