freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Wordfast integration [Yeah man!] / BDWF (cont.)


From: Henri Chorand
Subject: [Freecats-Dev] Wordfast integration [Yeah man!] / BDWF (cont.)
Date: Thu, 20 Feb 2003 17:02:31 +0100

Yves wrote:

> > Concerning a possible integration of Free CATS server within
> > Wordfast, you did not provide an "official" statement in the
> > dev list.
>
> my answer is yes - yes to having Wordfast compatible with the
> sort of server you envision. It's a plus for WF, for your server,
> for all users.

Champagne!    :-)))

Well, that's a Good Thing, and it should encourage the less than
enthusiastic folks among us.

Kirk, the first Mac version of Wordfast should be ready within the next few
days: you can begin thinking about it, and you'll be happy to keep working
from within MS Word. FYI, Dave and Julien are (among others) experienced
Wordfast users.

> > Do you have any opinion about my proposal (following Thierry's
> > advice) to base our bilingual working document format along the
> > lines of TMX?
>
> I think it's the correct thing to do. In any case, you would have to use
> Unicode for sure, then you would have to encapsulate information
> (creation date, creator, language codes, properties) using markers
> and in the end, if you don't go the TMX way, you would re-invent it
> somehow anyway.

That's what I thought, but I'm glad to hear this from an experienced CAT
developer like Yves.

> > Do you also happen to know something about various TMX
> > "flavours" (eg. RTF)?
> > Are they related to the way styles are encoded? Do you consider
> > a given one as a better choice?
>
> from this point onwards, it's a tough discussion.

I like it that way, once in a while  ;-)

> First, decide if you consider formatting information part of the
> translation, in which case you must use inline codes (like the
> <b> and </b> markers in html, for bold).
> If you use inline codes, you must opt for TMX level 2 and be
> ready for swollen file sizes: a TMX with inline codes can be
> monstruous in size (much like a Trados RTF export).

Well, sort of (manic grin here):
1) We want to keep (at least some) formatting info, because if we loose it,
then we loose something. At this stage, keeping it seems the right thing to
do.

2) Hard disk space is not a real issue, unless we store a TM's TUs
independently (each one would take up a hard disk block, around 4 Kb; it's
bad, I know...) To be discussed later, anyway.

3) So I thought: why not use a simplified way of storing them WITHIN the TM
(not in the source & target document's TUs where they must remain in full
form).

The brilliant part of this crazy idea is that, each time you extract a
target segment from the TM, you convert back its "generic" tags in order to
match the ones actually found in the document's source segment.

This would mean (let's take an example here):
The <I>little</I> cat is grey.
It would be represented in the TM as:
The <X>fat</X> cat
(where <X> and </X> are arbitrary tags; in fact, we would need the following
(sort of):
<fc/> Free CATS standalone tag
<fc>  Free CATS "begin" tag
</fc> Free CATS "end" tag
with possibly a variant when a tag contains specific info enclosed, like alt
tags, so as to use:
Some text <fc2>Alt Tag text here</fc2> end of sentence.
which would not be exactly the same as:
Some text <fc>other text</fc> end of sentence.
But in fact, we may decide it's the same as far as we are concerned.

(It's up to us to arbitrarily define our own custom stuff here, as long as
it's reasonably compatible with common parsing rules.)

As found in W3C's XHTML definition:
Empty elements must either have an end tag or the start tag must end with
/>. For instance, <br/> or <hr></hr>.
See at:
http://www.w3.org/TR/2000/REC-xhtml1-20000126/

and that way, our example is therefore a 100% match of
The <B>little</B> cat is grey.
(as the latter would be represented by the same TU within the TM.)

Such mapping will be easy if (in the most frequent case) a basic algorithm
can assign REAL tags (the ones found in the source segment) in the current
TU's target segment based on generic tags found in the fuzzy/100% target
segment returned - if the number of generic (standalone, begin & end) tags
is identical in the opened TU's source & target segments

> With Wordfast, I went for a very much lighter standard - a Wordfast
> TM is typically 4 times smaller than a corresponding TMX level 1, or
> than a Trados one.

Well, after all, the only thing you risk here is convincing us your solution
is better, than putting your file format under GPL ;-)

> All necessary information is kept in an intrinsically explicit way,
> without the need for mark-up (I use a columned structure where
> every column carries a specific info, rather than using mark-up).

A sort of CSV file? Could you please provide an example here?

> One good point is you open this with Excel or Word or Access
> or NotePad or OpenOffice and immediately see the info (who
> created the TU, when, what languages etc).
> Translators without computer knowledge can manipulate these
> TUs with copy-paste etc.

This looks very good.

> I don't see drawbacks in this. Someone may point out that the
> WF format is not extensible like TMX, although I did extend it
> between Wordfast version 2 and 3 by adding TU attributes,
> keeping both upward and downward compatibility (it just
> meant adding more columns. WF1 and 2 ignore the extra
> columns, WF3 uses them. No need for conversions).

:-)

> Wordfast does NOT use inline codes (does not "remember"
> formats, like font attributes). Been a lot of talk in our discussion
> group. Vast majority agrees the loss is very minimal. (WF
> makes every effort to duplicate the source segment's layout
> to the proposed match)

Maybe you would see my idea above as an enhancement, then we could use a
Best of Both Worlds approach :-)

> Have you decided on this issue? It's a capital one, once the
> choice is made, it would be very difficult to change it.

Definitely.
No, we only have my above suggestion to work on (also covered in my DB
Indexing document), but I'm still waiting for anybody's feedback on this.

<Loudspeaker mode ON>
PLEASE, Ladies and Gentlemen, tell us what you think - at least once Yves
provides feed-back on my above ideas and gives an example of what he's doing
so that we understand it better and can make up our mind.
</Loudspeaker mode OFF>


Regards,

Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]