freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freecats-Dev] Object Prevalence (cont.)


From: Yves Champollion
Subject: Re: [Freecats-Dev] Object Prevalence (cont.)
Date: Tue, 4 Mar 2003 17:46:03 +0100

From: "Henri Chorand" <address@hidden>
To: "Free CATS Dev List" <address@hidden>

> Prove it, then - on your FTP site, preferrably, mine is too minuscule for
> this  :-))
> Now I'm thinking about it, I know a translator who's working on a 4
million
> TUs TM - and it's a nightmare with Trados, of course ;-)

what are the specs for the TM engine that Free CATS plans to develop? (like,
maximum DB size?).

my opinion is that TM engines run databases very much *unlike* all other DB
engines I know, so developing a DB engine from scratch makes sense. One
major point is that the index/contents ratio in TM engines is higher than
usual DB systems.

In classical DB, indexes are sorted sub-sets of the DB. They're most usually
accessed by dichotomic search. With linguistic fuzzy searches, dichotomic
searches lose meaning. Conclusion: forget existing DB engines. (one
temptation is, rather than index all segments, index all words, but in
practical terms, this leads to a dead end if you go over a certain DB size;
you may end up with indexes actually 10x larger than the DB itself,
impossible to load in RAM; very slow performance; very long indexing
procedures).

So your idea to "load DB in RAM" makes sense, but you'd load a smart
(linguistcial matching-oriented)compression of source segments rather than
the entire database. From my own experience, you're left with just 1/5th of
the original database: 1/6th in text, plus pointers, comes up to 1/5th. This
way, a TM of 100 Mbytes yields a fully searchable index of about 20 Mbytes.
This index definitely belongs in RAM: it has to be searched repeatedly for
matching bits and pieces if no exact match was found at first. I call this
an index although, in classical terms, it's more like a compressed database
in itself.

Yves Champollion







reply via email to

[Prev in Thread] Current Thread [Next in Thread]