freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Re: Trados/other CAT, Python/Java, German/English


From: Keith Godfrey
Subject: [Freecats-Dev] Re: Trados/other CAT, Python/Java, German/English
Date: Tue, 25 Feb 2003 15:11:38 -0600
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01

Henri Chorand wrote:

As translators, our first aim was to provide a full-fledged standalone
translation editor, because it might be the most productive solution. We
then quickly realized that we would need as many conversion filters as
possible in order to be able to translate whatever customers require, and we thought about the huge job done by Open Office team. We soon realized their conversion filters would have to be integrated into Free CATS client(s).


Building a standalone translation editor of good quality, capable of representing source files of different natures reasonably well (forget accurately) will be a significant undertaking, imho, and one that will be very difficult to accomplish in a cross-platform manner. One possible solution would be to take an open source word processor that has some filters to start with (such as Abiword) and build upon that, but then you're tying yourself to a specific platform. On the plus side, you've already got a solid infrastructure to start from. A CAT tool imbedded within OpenOffice.org, if it were possible, might provide the most optimal solution, but I've never heard if such a task would work.


There are two options there:
- We find a way to build this interactive translation editor (this means
we have to adapt Open Office's filters)
- We build a tool that works from within OO, like Trados with MS Word (we can reuse these filters without any extra work).

In an ideal world, translators might ask for both a standalone
translation editor (like OmegaT) and integration within a word processor.


My background would suggest that a focus should be made on one or the other techniques - trying to satisfy both will be significantly more complex and likely never be completed.



I can't pretend we dwelved deeply into OO's internals, but we found out
the following:
- OO has no macro language. Something may be done at a later stage.
- OO's API is well documented, so it might be rather doable to do
something, especially if we only implement a toolbar calling a set of
external functions.


Intead of using Abiword, it might be worth considering making a custom build of OOo with built in Trados like features (and port the parts of other open source tools, such as OmegaT, that would be needed to make the CAT side work).



The other solution we see is (assuming we start from OmegaT, which is an option I would personally favour):
- Separate client & TM server features in OmegaT
- Design a more sophisticated GUI interface (I believe we can bring a
number of clever ideas here)


One thing to consider - all of the file filters in OmegaT would require a complete rewrite if text style information needs to be extracted, and seperate output filters would need to be created if the user is allowed modify the file formatting. OmegaT's filters are reasonably simple - they extract bits of text from a stream of data (the source file) and simply replace that text with translated text when writing the translated file. That method provides very strict enforcement of no formatting changes outside of the proper word processing editor. Unless one goes with a high quality word processor (such as OOo), it may be dangerous try to modify formats - clients may end up with files that don't work for them (I've spent several years as a localization engineer and have seen plenty of corrupted files - another reason for OmegaT's strict formatting policy)



We would all very much appreciate if you could provide us with a
general description of OmegaT's indexing & fuzzy matching features.


See my other email for this




At this stage, we need to know IF (and to which extent) OmegaT keeps all such formatting (as found in OO's native XML format files).


As mentioned above, OmegaT ignores all formatting information - it's only interest is in whether or not the formatting tags are 'hard' (like paragraph boundaries) or 'soft' (like formatting boundaries). The tags are discarded after identification.

BTW - I'm not subscribed to the list so if someone wishes to contact me, please do so directly.

Keith





reply via email to

[Prev in Thread] Current Thread [Next in Thread]