freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Re: Trados/other CAT, Python/Java, German/English


From: Henri Chorand
Subject: [Freecats-Dev] Re: Trados/other CAT, Python/Java, German/English
Date: Tue, 25 Feb 2003 00:07:33 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Keith Godfrey wrote:

> (...)
> I won't contribute to the religious war about which language to use
 > - I do believe Java would probably be best for a multi-platform
 > approach, but I have serious questions about it as well,

My (present) choice of Python is only based on the following:
- reasonably easy to learn
- portable & OO
- nice to prototype with
- (last but not least) good for calling code written in whatever else
- might enable us to do some prototyping...

That said, the one who provides the code will be the winner ;-)

So, hopefully we'll be able to play a bit with it, and if, later on, our
project is rewritten in whatever more efficient solution, no problemo.

> primarily because I don't know how Word, OpenOffice.org or what the
> target WordProcessor of choice interfaces with external applications.

This is an important issue.

As translators, our first aim was to provide a full-fledged standalone
translation editor, because it might be the most productive solution. We
then quickly realized that we would need as many conversion filters as
possible in order to be able to translate whatever customers require, and we thought about the huge job done by Open Office team. We soon realized their conversion filters would have to be integrated into Free CATS client(s).

There are two options there:
- We find a way to build this interactive translation editor (this means
we have to adapt Open Office's filters)
- We build a tool that works from within OO, like Trados with MS Word (we can reuse these filters without any extra work).

In an ideal world, translators might ask for both a standalone
translation editor (like OmegaT) and integration within a word processor.

We did not want to use MS Office for this (M$ non-portable stuff,
horrible VBA), and the best other reason for this is that Yves
Champollion (who happens to be a descendant of the famous Champollion, who deciphered the Rosetta stone), WordFast developer, has now agreed to make his next version of WF compatible with Free CATS. That way, translators will be able to work from within MS Word if they want it.

> For example, is an external Java/Python/C++ application able to send
> and receive data from a word processor and, further, tell a word
> processor what to do?  I know there are scripting capabilities built
> into most WPs, but I am completely ignorant on how far that
> functionality can take you.  If this is addressed in the specs you've
> released, please forgive me (I'm not spending much time on computers
> these days).

I can't pretend we dwelved deeply into OO's internals, but we found out
the following:
- OO has no macro language. Something may be done at a later stage.
- OO's API is well documented, so it might be rather doable to do
something, especially if we only implement a toolbar calling a set of
external functions.

The other solution we see is (assuming we start from OmegaT, which is an option I would personally favour):
- Separate client & TM server features in OmegaT
- Design a more sophisticated GUI interface (I believe we can bring a
number of clever ideas here)

In fact, both are desirable - we only need to settle priorities.

> [here's a little feverish rambling about OmegaT and KBabel pros and
> cons]
> OmegaT was originally designed as a client-server application and it
> wouldn't take that much effort to remove the UI and slave that
> functionality away to a word processor, assuming a communication link
> can be established (above).  Following that assumption, it should be
> reasonably easy (through WP scripting) to accomplish a Trados like
> interface.  I'm really not familiar with KBabel and so can't provide

> much input on its pros and cons, but I can say the fuzzy matching
> algorithm in OmegaT is pretty strong (yeah, like I can speak
> objectively about that *grin*).

I'll take your word for it :-))

We would all very much appreciate if you could provide us with a
general description of OmegaT's indexing & fuzzy matching features. I believe it might be exactly (or very close from) what we're trying to build - let's not reinvent the wheel if we can avoid it. If the remainder of the project team feels otherwise, they'll say so, but I doubt it.

Our main concern presently is, how much formatting we want to keep in
the TM's translation units. If OmegaT keeps only paragraph-level formatting, you will only keep plain text in TUs, and will therefore loose character-level & other formatting.

At this stage, we need to know IF (and to which extent) OmegaT keeps all such formatting (as found in OO's native XML format files). If not, I had an idea about it, but I'll wait for your answer before detailing it (you might see my post about it last week in Free CATS dev list).

Let me know if you agree with this way of doing, so that we don't need
to start from scratch.

KBabel might not be the ideal tool to start with, because of its
non-portability at present and because it was purely resource
file-oriented. That said, I hope its present developer, Stanislav
Visnovsky, can follow our work, contribute to it somewhat, and later
integrate Free CATS server in it. It would be in line with Free CATS's
aim to provide an open TM server anyway, and I hope many other projects will do the same.

If, similarly, Stanislav does the same (explain us what KBabel does for
TM internals), I bet we'll end up with great stuff. The major thing
KBabel might need is to take into account that, ultimately, one would
work with a variety of TMs, instead of the one used for KDE translation (an obvious design choice when considering its original requirements).

> While KBabel won't run on Windows (because of problems with KDE
> specific libs) you shouldn't take that as a hinderance as you can
> always port the source code to a non-KDE environment for cross
> platform operation.  That would make it a completely 'different'
> application from KBabel, but at least it would be starting at the
> same point.

> You'd inherit all of it's current functionality except the user
> interface, and you don't need the UI anyways if you're planning
 > on using a word processor for the translator to interface with -
 > you're just reduced to the same issue of interfacing with the WP
 > that you have with OmegaT.

Since we should either use a WP-based interface or help design a
new, more sophisticated one, at this stage, I believe separating your
server & client parts and seeing what may be improved (if any) in OmegaT's indexing & fuzzy matching features could be the most productive thing to do.

> Because KBabel is C/C++ (I believe), you'd need to release a
> compiled binary for each supported platform (Mac, Win, Linux) which
> isn't bad although it can be annoying.  Using a derivative of OmegaT,
> you have to make sure that the end user has Java installed and enough
> memory to run it.
> I hope that makes some sense...

Stanislav posted an update in order to tell us about poEdit (seemingly similar to KBabel but apparently more portable). Yet Another try at a perfect Fuzzy Matching Wonder - I mean all these separate efforts for implementing a fuzzy matching engine should greatly benefit from a "feature peer review" if we want to rival proprietary products any time soon.

> Sorry for the comments on Python versus Java previously - I reacted a
> bit strongly there.  Also, as far as your strength in the language
> goes - I wouldn't be too worried about that as there are several
> aspects to a software project that don't require coding (design
> specifications, testing, documentation, web site management, etc)
> that are more than enough to keep a couple of people busy.

As soon as experienced coders come in and join, I'm sure the present team members will be very happy to carefully follow feature issues and to help at all these non-coding tasks you mention. I personally am not too bad at documentation design & localization, and our group of translators (Breton & other) is ready to help testing & localizing.

> Java and C++ are pretty similar (as is C#, Microsloths bastardized
> attempt to create a language to steal people away from Java).
> C++ is a bitch to learn well without knowing C, and C is probably
> the most difficult (and powerful) of programming languages to master.
> Java has a few shortcuts that C++ is missing and there are easier
> ways to come up to speed than this, but don't feel bad about being
> a bit intimidated by it!

Well, I feel quite intimidated by having to select & implement the comprehensive environment needed in order to develop this project, and the Linux platform seems to love getting half-lost in a maze of different libraries sitting on top of each other. In other words, I'm not qualified for being a software architect. That said, I liked coding when I happened to do it some years ago, and I feel more fit for writing bits of code in an already well-defined environment than for setting up the would shebang from scratch.

Among the project team members and regular Dev list followers, only Bertrand Courté and Charles Stewart may be able to dwelve into Java code - if only 10 official denials could prove I'm awfully wrong ;-)

So, to put it in a nutshell, I'm not able to start a new project in Java, but if you feel like working with us along with the above suggestions, I'm sure some of us will learn enough of Java to begin reading code, help design cute interfaces and quite possibly write a few things here and there.

Have a nice trip back home and let us know what you decide. Up to now, among the free CAT software projects we could see, I believe your project looks like one of the most advanced ones and would make up a solid foundation.


Cheers,

Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]