[Freecats-Dev] Re: Free CATS

freecats-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Re: Free CATS - possible help

From:	Henri Chorand
Subject:	[Freecats-Dev] Re: Free CATS - possible help
Date:	Wed, 15 Jan 2003 01:12:08 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

David,

I'm sending a copy of this e-mail (a reply to your 2 last ones) over ourmailing list, as it will interest all team members. Please feel free tosubscribe even though you I know don't have time to code with us.

Good news: Savannah accepted my request to host Free Cats.


Cool!

No problem.  I wish I could do more, but you are setting out to
create a large, complex, and full featured application, and to be
fair, I just don't have time.  Do feel free, on the other hand, to
contact me, or post to comp.lang.tcl for Tcl/Tk help!


Sure, thanks. I found about this NG yesterday, when exploring tcl.tk
web site.

You do help me by providing answers to my questions, at least the
ones concerning design issues. Here is one for you.

Considering we need to store translation units in a translation
memory (a CAT database), what do you think about implementing it on
top of an existing native XML DB server?


Sounds logical.  I think most of them are in Java, though, which is

> an unpleasant thought.  Java tends to not play that well with the rest
> of the world, I've found.

What you say about Java confirms a some feeling I had, and I'll takeyour word for it.

I even thought, could be implement an alpha version of our TM serverwith flat, .INI type files and simple string parsing functions, just tosee something running ;-)

I would start playing around with a few different things, and see

> what works best.

Well, as we are not that many yet, (speaking for the database servercomponent), I guess the first step could consist in:

- reading the documentation

- if it seems suitable, contacting the project team to ask for helpimplementing our custom indexing features.

I know it seems naive, but it may work. After everything I heard aboutSavannah and how selective they were about accepting projects, I seetheir quick green light for accepting Free CATS as a good sign.

Since (and following your advice) we chose Tcl/Tk as the maindevelopment team, I found out there are several, readily availablecomponents which might interest us. As you are a member of Apache Tclproject, do you have something to tell us about:

http://xindice-xmlrpc.sourceforge.net/

By the way, I just noticed lots of new materials at:
http://xml.apache.org/xindice/
I am right in assuming Apache Xindice is a Java project?

As I see it with a newbie's eye, I believe it would mainly require
implementing custom indexing features, so as to be able to perform
fuzzy matching. We are working to define exactly what we need to
index within each translation unit's source segment and possible
algorithms.

Basically, for a given sentence, we need to index:
- each word in it, as well as tags (not a specific tag, but "a"
(generic) tag, as the real tag will come from another TU's source
segment) and punctuation marks
- the sequence of these items in the target segment.


Sounds logical.

There is another, major design issue for which I would be glad to hearfrom you (and which we discussed at our first project team meeting lastweek).

We know our document working format is going to be tagged. XML, andtherefore Oasis' XLIFF, seems an obvious choice, but at the same time,in our little newbies' heads, we couldn't help raising a few issues:- XML specification is very "theoric" (writing a full-fledged XML parseris a hard task).- We can't help thinking about all existing HTML documents published sofar, which structure is invalid from XML syntax's point of view.- We don't need/want to understand/alter the XML structure of translateddocuments - in fact, we want to be sure we preserve (and ignore) it.

- Could a "dumb" approach be better than a "full-fledged" one?

I mean, we don't need/want to translate documents the way an authorwould edit an XML document.We first thought we would select and adapt an existing (free) XML editorso as to integrate it as Free CATS's editing document.This implies we would have to deal to many complex already integratedfeatures without which, in fact, we might be better off.I tend to think that, for XML documents, we need to parse them in theMOST simple way, so as to identify:

- actual text contents (to be translated)

- "internal" formatting tags (to be played with, to some extent, but atthe very least, we'll be able to accurately specify what we need)

- "external" (XML structure) tags (to be left untouched).

So, in fact, we're looking for a type of parser which would mark as"Don't touch" these external tags, and create a sequence of "sourcematerials" (internal tags & text contents) which would be automaticallycut into translation units (TU):

(sorry for my very limited
<TU>
a few simple data here (fuzzy matching rate)
source segment (to be translated)
<Middle of TU>
target segment (translated, to be inserted during translation)
</TU>

From the bits I understood from XML's official definition, we wouldonly have to make sure that a given XML source document does not containthe very string which is going to represent our own, custom tags.

As "xml" is a reserved string, a quick-and-dirty hack might consist inincluding this very sequence as part of our own tags. That way, we maynot risk meeting it as part of the source document's original contents.


After that, things should be (quite more) simple...

Solving this in a simple and elegant way will be a major step. In fact,we only have two "real" problems (read: "big" and tricky issues): theone I tried to describe above, and the DBMS choice issue. Lots of thingsmust be taken care of, like a secure access to the DBMS, but they can wait.



(from your second message)
> This looks like it might be useful to you:
>
> http://www.indexdata.dk/zebra/

Oh, well... yes. Thanks a bunch, David.

This might be THE answer.
(may I insert a comment for one of the team project members:
BERTRAND, VA VOIR,  C'EST POUR TOI  !!!!!!!!!!!!!)


Regards,

Henri

[Prev in Thread]

Current Thread

[Next in Thread]

[Freecats-Dev] Re: Free CATS - possible help, Henri Chorand <=
- [Freecats-Dev] Re: Free CATS - possible help, David N. Welton, 2003/01/16

Prev by Date: [Freecats-Dev] Project-related communication, development tools, etc.
Next by Date: [Freecats-Dev] Next project team meeting at Kemper DOC - February 7 - 9:45
Previous by thread: [Freecats-Dev] Project-related communication, development tools, etc.
Next by thread: [Freecats-Dev] Re: Free CATS - possible help
Index(es):
- Date
- Thread