freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Re: Free CATS - possible help


From: Henri Chorand
Subject: [Freecats-Dev] Re: Free CATS - possible help
Date: Wed, 15 Jan 2003 01:12:08 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

David,

I'm sending a copy of this e-mail (a reply to your 2 last ones) over our mailing list, as it will interest all team members. Please feel free to subscribe even though you I know don't have time to code with us.

Good news: Savannah accepted my request to host Free Cats.

Cool!


No problem.  I wish I could do more, but you are setting out to
create a large, complex, and full featured application, and to be
fair, I just don't have time.  Do feel free, on the other hand, to
contact me, or post to comp.lang.tcl for Tcl/Tk help!

Sure, thanks. I found about this NG yesterday, when exploring tcl.tk
web site.

You do help me by providing answers to my questions, at least the
ones concerning design issues. Here is one for you.

Considering we need to store translation units in a translation
memory (a CAT database), what do you think about implementing it on
top of an existing native XML DB server?

Sounds logical.  I think most of them are in Java, though, which is
> an unpleasant thought.  Java tends to not play that well with the rest
> of the world, I've found.

What you say about Java confirms a some feeling I had, and I'll take your word for it.

I even thought, could be implement an alpha version of our TM server with flat, .INI type files and simple string parsing functions, just to see something running ;-)

I would start playing around with a few different things, and see
> what works best.

Well, as we are not that many yet, (speaking for the database server component), I guess the first step could consist in:
- reading the documentation
- if it seems suitable, contacting the project team to ask for help implementing our custom indexing features.

I know it seems naive, but it may work. After everything I heard about Savannah and how selective they were about accepting projects, I see their quick green light for accepting Free CATS as a good sign.

Since (and following your advice) we chose Tcl/Tk as the main development team, I found out there are several, readily available components which might interest us. As you are a member of Apache Tcl project, do you have something to tell us about:
http://xindice-xmlrpc.sourceforge.net/

By the way, I just noticed lots of new materials at:
http://xml.apache.org/xindice/
I am right in assuming Apache Xindice is a Java project?

As I see it with a newbie's eye, I believe it would mainly require
implementing custom indexing features, so as to be able to perform
fuzzy matching. We are working to define exactly what we need to
index within each translation unit's source segment and possible
algorithms.

Basically, for a given sentence, we need to index:
- each word in it, as well as tags (not a specific tag, but "a"
(generic) tag, as the real tag will come from another TU's source
segment) and punctuation marks
- the sequence of these items in the target segment.

Sounds logical.

There is another, major design issue for which I would be glad to hear from you (and which we discussed at our first project team meeting last week).

We know our document working format is going to be tagged. XML, and therefore Oasis' XLIFF, seems an obvious choice, but at the same time, in our little newbies' heads, we couldn't help raising a few issues: - XML specification is very "theoric" (writing a full-fledged XML parser is a hard task). - We can't help thinking about all existing HTML documents published so far, which structure is invalid from XML syntax's point of view. - We don't need/want to understand/alter the XML structure of translated documents - in fact, we want to be sure we preserve (and ignore) it.
- Could a "dumb" approach be better than a "full-fledged" one?

I mean, we don't need/want to translate documents the way an author would edit an XML document. We first thought we would select and adapt an existing (free) XML editor so as to integrate it as Free CATS's editing document. This implies we would have to deal to many complex already integrated features without which, in fact, we might be better off. I tend to think that, for XML documents, we need to parse them in the MOST simple way, so as to identify:
- actual text contents (to be translated)
- "internal" formatting tags (to be played with, to some extent, but at the very least, we'll be able to accurately specify what we need)
- "external" (XML structure) tags (to be left untouched).

So, in fact, we're looking for a type of parser which would mark as "Don't touch" these external tags, and create a sequence of "source materials" (internal tags & text contents) which would be automatically cut into translation units (TU):
(sorry for my very limited
<TU>
a few simple data here (fuzzy matching rate)
source segment (to be translated)
<Middle of TU>
target segment (translated, to be inserted during translation)
</TU>

From the bits I understood from XML's official definition, we would only have to make sure that a given XML source document does not contain the very string which is going to represent our own, custom tags.

As "xml" is a reserved string, a quick-and-dirty hack might consist in including this very sequence as part of our own tags. That way, we may not risk meeting it as part of the source document's original contents.

After that, things should be (quite more) simple...

Solving this in a simple and elegant way will be a major step. In fact, we only have two "real" problems (read: "big" and tricky issues): the one I tried to describe above, and the DBMS choice issue. Lots of things must be taken care of, like a secure access to the DBMS, but they can wait.


(from your second message)
> This looks like it might be useful to you:
>
> http://www.indexdata.dk/zebra/

Oh, well... yes. Thanks a bunch, David.

This might be THE answer.
(may I insert a comment for one of the team project members:
BERTRAND, VA VOIR,  C'EST POUR TOI  !!!!!!!!!!!!!)


Regards,

Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]