freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] KBabel option - what's at stake


From: Henri Chorand
Subject: [Freecats-Dev] KBabel option - what's at stake
Date: Wed, 19 Feb 2003 19:41:25 +0100

Hi all,

If ever any of you were wondering why I remained silent today - I've been
away most of the day (I was giving a course at Brest's UBO this afternoon
and also had to prepare it, incidentally.)

This morning, I thought:
Well, we are now seeing a very alive mailing list :-)
Now that I see the 15 messages of the day - I'll read each of them with
care, but obviously a few things may need to be restated.
Here are (only) my answers to Stanislav's first message of today.

<Project Leader Mode ON>
1) I sincerely wish Free CATS can be built on top of a number of successful
free software projects. It seems a good practice (it follows common sense),
and consistent with the way free software works.

2) We have JUST STARTED seriously assessing existing projects of interest
(mostly OmegaT & KBabel), so I won't accept what I consider premature
conclusions. We must NOT forget there often are many ways to reach a given
goal, which may obviously differ on other aspects. Let's carry on this work
so that it brings results, one way or another.

3) While we, project team, have (broadly) determined our goals, a lot of
work remains to be done before we can say our specifications documents are
finished - and then, we are supposed to begin coding, folks, so let's remain
modest.
<Project Leader Mode OFF>

So, later on tonight, I'll read everything else that was written today
before answering the other messages. I hope I can quickly summarize their
contents into which what I endorse and what I don't.

> > The things to consider are:
> >
> > 1) .PO files
> > (...)
>
> KBabel as of version 1.2 will support import/export filters, ATM we open
> PO files and Qt Linguist files.
>
> BTW, a PO file is a standard format of GNU gettext and de facto standard
> for translation of Linux software. In opensource world I'm aware only of
> two other formats: Mozilla and OpenOffice. You can even use PO files to
> translate PHP (I've never done this myself though).

This is fine to know. At this stage, I'm confident about KBabel's ability to
process all these Linux-specific (sorry for lacking a better term) resource
files.

For me, in the progress curve from KBabel's present features to our dream
product, the most sensitive issue is to make it able to handle documents in
a tagged format (XML structure tags, HTML layout tags).

> > 2) Server API & connection
> > Even if I wrote out an API which certainly looks very close from what we
> > will end up with, we have not accurately stated the dialog
> between client &
> > server.
> > I'm presently thinking about it and I'll try to provide a
> detailed document
> > within a few weeks. Basically, seeing what we may be able to
> optimize and
> > the low number of connected clients (as compared with an average HTTP
> > server), I believe it will be a (simple) connected mode.
>
> It sounds reasonable. Let's start with simpler methods and then introduce
> advanced features.

We might start from the existing Berkeley plug-in:
- assessing its features
- adding HTTP access (connection management & read/update queries in
connected mode)
- bringing improvements

> > 3) KBabel features
> > (...)
> > - What does it presently do concerning translation memories?
>
> This is probably the least advanced area of KBabel. Technically, we
> support translation memory plugins with kind of simple interface (maybe
> too simple, but this can be easily changed and I need a feedback what is
> needed, ideally by trying to develop a new, advanced module).

Fuzzy matching is, along with its very name, something with a more or less
loose definition or more or less approximate implementation. We professional
translators appreciate its power and ease of use, but we must run more
specific tests of existing proprietary CAT software so as to check
assumptions and refine our design.
As long as KBabel is able to store source & target strings, fuzzy matching
will be about building custom indexes (possibly along the lines of my DB
Indexing specifications document or else), and this is an area where:
- we want to take the time to assess existing proprietary software in depth
- we need to take the time to state what the ideal features and behaviour
would be
Of course, we can also try to implement our TM at the filesystem level :-)

Stanislav, I see us as being able to provide good guidelines very soon, but
this area will obviously deserve to be refined over time. In my DB Indexing
document, I have given an example of a system in which we can quickly
perform advanced testing by changing parameter values used by the indexing
functions.

> ATM, we have the following plugins:
> 1. Translation memory based on Berkeley Database II
>    - supports storing he translations on-the-fly, but without
>      possibility to control what goes in and what does not

You mean that you send update queries and don't control the result (data
written or not) afterwards?

>    - retrieving exact translation and also single word translation works

Right, these are perfect matches in our jargon.

If you wrote a specifications document, I'll be happy to review it, I'm sure
I'll learn from it - and I'll stop asking "detailed" questions before
grasping the whole picture more clearly.

> 2. PO compendium
>    - retrieving exact/partial translations from compendium

If I understand you well, compendium the name you give to a set (catalogue)
of reference source segments ( a translation memory)?
So partial here means fuzzy matching (retrieving more or less similar,
already translated source segments and using their translation to more
quickly enter the proper translation of current source segment)

> 3. Auxiliary PO file
>    - retrievve exact translations from other PO - used for similar
>      languages, e.g. translating to Slovak could use this module
>      to initialize the translation from Czech (it's not very ideal,
>      but can help a lot).
>    - This plugin is typically used for _searching_ the translation,
>      not for automatic initialization.

I'm not sure we would use it, but if it helps others, why not. I'm not sure
we'll bring new things in this area.

> 4. TMX compendium
>    - slight modification of (2), which roughly supports TMX 1.4 format
>      The format itself should probably move to import/export filters
>      making this plugin obsolete, since you could use PO Compendium then
>      (the name is not that great, I know :-))

TMX is our most important standard - it enables us to exchange TMs
seamlessly between various proprietary CAT software.
We only need it for TM import/export - we consider we're totally free to
choose any (suitable) TM database format internally.

> > - Does it recycle (remember) existing translations for a given source &
> > target language pair, and if yes, how?
>
> It's done by Translation memory. ATM, you can allow to store the
> translation (language,file,original message,translated message) on every
> change or manually let the database to read a file.

We are not accustomed to keeping track of such a reference to a specific
file, but we understand the obvious reasons why you used it. We plan to keep
track of some ancillary data too.

> > - Does it allow fuzzy matching?
>
> The plugin interface does, but what exactly "fuzzy" means is up to the
> plugin. Also, only PO comendium does support fuzzy matching ATM.

So it's the TM server's work - such a layered design is good. The more a
software uses a layered approach, and the more it may be adapted - you see
my point here ;-)

> > Let me also know if you can easily access our specification
> > documents which you can find as attachments in the mailing
> > list archive, or if you prefer me to send them to your e-mail
> > address.
>
> It seems quite easy to find them. Maybe it's time to move them
> outside of the mailing list archive and put on the web page with
> a notice that it's work in progress.

Sure. Ideally, I need a few days to split the first one into chapters,
quickly review some of them and provide them as separate HTML files.

Simos, could you please handle the document publishing aspect on Savannah?

> (...)
> Yes, it is. I would be happy if you could test KDE 3.1 (contains KBabel
> 1.0 with a lot of enhancements, but mostly editor-wise, not for plugins).
>
> All the latest releases are available on the KBabel homepage in source
> form.

Fine. I just saw that KDE 3.1 considered stable.
I hope Red Hat makes it available via RPMs, as for now, I've only installed
a couple of small things on my box, and KDE is a large project.

> > (...)
> > So, I would like to know:
> > - How close or how far KBabel is from being able to be ported
> > on Win32 and Mac OS X platforms
>
> Mac OS X support is pretty close, since an effort to port KDE to Mac OS X
> progress nicely. Win32 support is a problem, since KBabel relies on KDE
> libraries heavily.

I was not clear here. We're not asking for a full KDE port - how would I
dare ;-)  - only KBabel.
I also want to clearly state this only is a query about a state of things.
It does not mean we require anything here.
For us, Win32 & Mac OS X support will obviously be a major goal. For now,
one may find a way via a double boot configuration, Virtual PC or whatever
else.
Maybe ENSTB team could work on porting KBabel to Windows if we choose this
option (starting from KBabel and closely cooperate in order to develop new
features on it).

I think we should let you a little time to:
- assess the effort associated to porting KBabel to Win32 / Mac OS X
- determine how much of this effort you can/cannot provide
- see how KDE team could take care of NOT breaking this portability for all
future KBabel upgrades

> > Our initial idea was something along the lines of Python +
> wxPython, C/C++
> > or whatever similar. I see on KBabel's home page that it uses
> > and/or plans to use:
> > - Qt (Qt Designer IS a portable GUI designer, and it's free to
> > use for GPL software)
>
> On X11

If I remember well, this means porting is not going to be that easy.

> > - BerkeleyDB (which I thought about as an option for our TM server, also
> > GPL'ed and portable)
> > As far as I can tell with a first look, it really looks feasible.
>
> It's portable, but we have identified the following problem which hunts us
> pretty hard: it is not source/binary compatible between major versions. So
> if the application is developed for version 2 (as is the KBabel plugin),
> one needs to adapt it for different versions with a need of database
> rebuild. We plan to rewrite the module using generic SQL and allow to
> connect to any SQL database (there is also SQLite, which is nice for
> personal use without too much hassle).

A SQL-type database is handy in many ways, but from our point of view, it
might not bring that much help. Have you considered using the filesystem and
a number of flat files for a TM server, or do you consider this option as
highly exotic/risky/bad for whatever reason?

> > Let me be clear. KBabel's present aim is to help KDE team localize its
> > software and it must of course remain as such, but it may not be an
> > exclusive option. If I believed, one way or another, that my suggestion
> > would hamper KDE team's ability to deal best with KDE's own localization
> > goals, I would not propose it.
> > (...)
>
> My key point, why I responded in the first place is, that KBabel could
> provide an almost ready to go client with support for network transparency
> and other nice stuff of KDE. It is in fact more a translation editor (no
> fancy fuzzy/automatic translation support). And I can help to adapt KBabel
> to test/prototype the server etc. This is a win-win situation IMHO.

Certainly.

> > So, Stanislav, if you believe KBabel is an option for us - if
> > KBabel team, which you manage, sees the following features
> > as a Good Thing:
> > - GUI client portability
>
> A bit of problem. You really need KDE libs, available on Unix-like
> platforms only ATM (there is KDE 2.2 port to windows, but newer
> KBabel versions need a more recent libraries).

I would really appreciate if you can provide a more detailed picture
(including possibly taking the time to explain things to non-Linux
programming experts like us).
This is where it might hurt.

For instance, did you use these non-portable (KDE-specific) libraries
because:
1) your project being obviously KDE-centered, at least until now, you did
not need to care?
2) you need features less easily available/missing in wxWindows or other
similar Linux/Win/OS X portability layer?

> > - translation memory technology (fuzzy matching)
> > - Unicode
>
> Qt and KBabel works in Unicode internally.

Great.

> > - adding as many different file formats as possible by building up
> > the required conversion filters
> > Maybe OmegaT team (Keith Godfrey & friends) can also help for
> > the XML / OO Writer file formats.
>
> XML is pretty easy, Qt provides nice support for that.

Good, especially since we do NOT need to care about XML validation - if a
file was good enough, then its translation won't be worse as long as it does
not alter any structure tags.

> > then I'm sure we'll all want to join and help to:
> > - Immediately elect KBabel as our translation client of choice for
> > Free CATS, which scope would be reduced to building up a server
> > component and helping to enhance KBabel
>
> No need to elect something, you should provide as much flexibility as
> possible.

> > - Build our Free CATS TM server on top of Berkeley DB in close
> > cooperation with what you already did for KBabel.
>
> As I've already mentioned, this is probably not a good option.

Well, if you read "filesystem" instead of "Berkeley DB", would your answer
be different? (this is my last try at it)

> > While few of us can directly contribute to code (and it also
> > depends on the language used), we can still do an awful lot
> > in terms of documentation, interface localization & public
> > relations (how to make a free software developers
> > localization tool into the best tool available for professional
> > translators & free software localization volunteers alike.
>
> But the goals fo FreeCATS are so ambitious, that it's necessary
> to do a small steps and KBabel could provide a foundation/
> replacement for missing parts while developing other.

Well, yes and no. That's the nice thing with any modular architecture.
Once our TM server works somehow, everybody will be free to develop whatever
clients (interactive translation, alignment of legacy translation, counting
& analysis) they want, portable or not.

So, if you add a plugin to KBabel that enables it to use a Free CATS TM
server, it's very nice to know, very useful for tests and so on.

All in all, at least at first sight, and apart from the portability issues,
there seems to be a strong closeness between what KBabel presently provides
and the kind of interactive translation client we would like to see. At
least, this was what I thought this morning :-)

We are not presently able to build a GUI client to our Free CATS server
soon, so it's tempting to select one of the most similar (or less different)
free software projects available and to see if we can make it fit into our
plans - OF COURSE, only if, and as long as it's fit; if it's reasonably in
line with the original project's own goals.

There are several options, all of which include, building up our TM server.
I'm considering the translation client here:

1) we start from scratch, with whatever tools we want, and are happy to see
other teams developing other clients to use our server if/when they feel
like it (nice idea, except we lack resources for now)

2) we start from another project which we consider is close enough, then
work on our own, at our own pace (possibly slow) - but isn't that forking?

3) we start from another project which we consider is close enough, AND work
in good cooperation with its team so that everything we want to add fits
nicely within the existing stuff

It should be obvious here that my main aim is to avoid duplicating efforts
if/when feasible here, nothing more.


Cheers,

Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]