[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freecats-Dev] OmegaT
From: |
Henri Chorand |
Subject: |
Re: [Freecats-Dev] OmegaT |
Date: |
Thu, 27 Mar 2003 23:09:31 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 |
Hi,
Here are a few thoughts brought by Keith's last message:
Something inspired me to take a rather critical look at the
> direction of the project specification and to provide some
> information from a software developers perspective (which
is quite different than how a translator sees a CAT tool).
Sure. We definitely need this sort of feedback.
> In summary, I think the project needs to be reigned in a bit
> and it needs a narrower focus. I might be a little frank in
> certain areas, so consider yourself forewarned!
Yep. As I already said, captain, there is no software architect yet on
this ship, so the seat is vacant.
Also, Yves and others might not fully agree with me, but you
> know what they say, opinions are like a$$holes - everybody's
> got one ;-)
Well, now that I posted your message (at last), everybody has a chance
to say something :-)
Also, for now, if we examine Free CATS' list of aimed features,
>> the ONLY request that we should ask Keith to do in OmegaT is to
>> split its client & server parts via an API, if only to allow:
- a multi-user mode with HTTP access
- other clients to use the server component.
<snip>
If Keith goes along the lines of what we're asking him above,
>> then I'm personnally ready to drop my Free CATS project coordinator
>> hat and I'll be happy helping at what I'm best.
I'm a little confused here (maybe because I haven't read the specs
yet... forgive me father!).
What exactly qualifies as the OmegaT server? If you're talking of
having FreeCATS as an imbedded component of an industrial strength
> word processor, which I do believe is on the wish list, then we're
> only talking about a fuzzy matching engine because the file
> management will be much better handled by the word processor.
We haven't exactly decided yet if we were to prefer a word-processor
embedded solution or a standalone editor-based one.
We translators well know the pros and cons of these, which is why we
would, ideally, want both ;-))
Once we agree on a client and a server parts, we may have several
clients using the same TM server. Also, some of these clients may even
be developed by external teams for whatever dedicated uses.
Another reason is that we deeply need a multi-user TM server (for
industrial-strength CAT projects), and we thought HTTP access would be
so nice. Our company is small, yet we often work in multi-user mode (up
to 6 translators on the same project). We would be very happy to see a
TM server that accepts between 10 & 20 simultaneous users on a dedicated
server with a standard 512/128 DSL line. If such a free tool ever comes
up, I bet it's going to be a major success.
For instance, if we start from OmegaT and go in this direction, we may:
- agree on an API that fits your code as well as other tools (Yves will
certainly suggest interesting things)
- draw a line within your code modules (classes) so as to decide which
is needed by a translation client and/or a TM server.
If you're talking about a stand-alone component capable of being
> slaved to a web server, then we're talking about stripping the UI
> off OmegaT and expanding the selection of file filters.
As we believe the files to be translated are to reside on the
translation client's local filesystem, the file filters will be managed
at the client level.
Either of these goals is quite possible to achieve on its own, but
> a decision has to be made as to which one to pursue.
Sure. We chose the server, because we believe it's important to create a
"proof-of-concept" and to test the technology in the wild.
You may have read that Yves Champollion promised to make a future
version of WordFast compatible with our server. We could later undertake
the development of a standalone translation client and/or of an Open
Office plug-in.
To accomplish both of them together will practically require in
advance knowledge about how both are to be designed and work and
> will take about as much effort as splitting OmegaT into two seperate
> and independent applications and supporting both.
In general, the more 'flexible' you make software, the more difficult
> it is to design, build and maintain, and often the end product is
> such a series of compromises (like Windows) that, while doing
> practically everything, it does nothing well.
This is a sound warning.
I realize that one of the benifits of a public forum for FreeCATs
> is to work out what is possible and what is not, but my feeling
> is that there are too many people adding on the wish list, most
> with very valid desires, but not enough committed developers to
> bring the list back down to earth, so the resultant spec is growing
> in complexity to something that can never practically be achieved.
Apart from HTTP access, everything that was included in our
specification documents has been implemented in one or more proprietary
CAT tools.
> Many elements in existing CAT tools are not dictated by what
> developers think is best for the translator, but what the developers
> can reasonably accomplish given available time, tools, hardware and
> technology.
True, which is why, if Free CATS is to succeed, it will begin with
something small that will be strong enough to evolve later, once it will
have attracted a lot of users.
Take for example the desire to have full movement of segment markers.
This is a very important item for many translators, but what are they
willing to give up for such a flexibility? If one wishes to have the
seemingly trivial ability to move segment markers beyond the equivalent
of hard return boundaries, then one restricts themself to operating
entirely within an industrial strength word processor. (I can go into
details if you wish, but supporting that ability in a CAT tool will
require building and maintaining some very complex and complete file
filters to support the arbitrary formatting changes such would require,
and also the necessary infrastructure to support such filters - a rather
non trivial task. After doing this, you might as well extend the UI and
you'll end up with a fully functional word processor).
Well, this is exactly the kind of advice we're looking for.
Moving segment markers within structural boundaries is relatively
simple, compared with the previous task, but it also has trade-offs,
primarily in string recall, fuzzy matching ability and performance. When
a segment marker is moved, a new 'string' is created, and the entire
database much be searched for strings identical or similar to this one
before information can be provided to the translator (there is no
pre-processing which can anticipate such resegmenting).
Sure. Even so, like with Trados and WordFast, a decent user
parameterization of segmenting should reduce the frequency of
resegmenting operations to a very low value. At this stage, if it takes
time (CPU), never mind.
We don't pretend it's needed every couple of sentences, only that it
will be needed from time to time. I personally translate a lot and
rarely resegment text. When I do it, it's often because of Trados
inefficiencies, but I would feel very unhappy if I was not allowed to.
My only "harsh" critic about OmegaT at this stage is that, as explained
by Marc, OmegaT only allows paragraph-level segmenting.
We know that a large number of translation agency customers expect us to
be able to use more sophisticated segmenting features than a fixed
paragraph-level only.
(...) I'm not a Trados user, but I do recall hearing about serious
performance degradations as the translation memory size grows,
presumably because of the resegmenting ability (or alternatively,
> just a poor search design). OmegaT [currently] has fixed segment
> markers, but by doing this it can provide fuzzy matching
> information from databases literally of biblical proportions
> (early design estimates assumed the translation memory could grow
> to somewhere near the size of the Bible and it wouldn't degrade
> performance, assuming the machine had sufficient memory.
Sure. Anyway, I don't think our points of view are really contradictory.
Basically, we translators mostly need to perform segmenting at sentence
level. Enabling the user to activate one or more out of several
optional, pre-defined delimiters (Tab ":" ". " "[line break]") ".[CR]"
"[CR]") - and to be able to interactively modify proposed segmentation
in statistically rare circumstances - seems to me a reasonable feature
to implement.
> (...). Unnecessary overkill? Oh yes. But it was a design
trade-off in the interest of simplicity and performance that
> (1) greatly assisted in OmegaT actually seeing the light of day and
> (2) enabled OmegaT to function under the significant performance
> penalty of running under a generic cross-platform architecture.
> It was _NOT_ because that's how I thought that's translators would
> work best (...)
Sure.
I may adjust OmegaT to support modifiable segments in the future
> (it is on my to-do list) but my emphasis will remain on performance
> and in not implementing hack solutions. My available time is split
> between several projects right now so I offer no timeframes.
As I was only asking you to consider this issue, I'm very satisfied with
your answer.
> Finally, I think a committed developer (or developers) to actually
> own the project needs to be found, maybe even a recent college grad
> from Russia or Pakistan looking for experience. If early releases
> of FreeCATS are sufficiently promising, and if it becomes necessary,
> it will be easier to raise 'donations' to support someone in one of
> the eastern countries to enable them to continue development.
> Raising sufficient $$ to influence a western developer is not
> realistic - you'll be bound by their available time and, most
> importantly, their interest in working on such a group project.
> I'm not saying this donation concept will be necessary, but one
> might at least consider it as a possibility.
1) Money - Donation option
We thought about that. Organizing a collective donation is something
we're ready to organize among us.
At this stage, it would be very useful if you could provide us with the
following:
- An estimate of the workload required (in man-days) for you to code the
next round of new features (interactive segmenting & splitting the
client and server portions, TMX support and other suggestions you
certainly have) - adding a little margin for debugging (with our help).
- The daily price you would request for doing this work.
Just to give a crude estimate, 200 translators each donating 50
Euros/USD, makes $ 10,000 $ available.
Kirk already collected a list of translators mailing lists which we can
use to raise interest - and funds - once we clearly define our goals and
spend some time in "marketing" the whole idea.
2) Development resources - other option
Also note that weak in coding skills at it now is, our present team
might be able to help as follows.
We have recently contacted two teachers at a famous French engineering
school and they seemed eager to help us. They suggested they could help
by providing a few man-months of development time by last-year students.
As you would be able to direct their efforts, it could prove very
valuable, at least for some parts of the job.
Of course, they have yet to decide if our project is worth it, but I
consider that if we tell them we want to join forces with you and Marc
(I'll come back quickly to Marc's message), it means we won't be
starting from scratch, but from OmegaT 1.0.2.
I believe it will make quite a difference in terms of credibility.
You can also be sure that we can help by publicizing your project and
bringing attention, testers, documentation writers (me for instance,
along with Kirk & several others) & so on.
So, let us know your thoughts,
Henri