freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freecats-Dev] Underlying Unix base


From: Marc Prior
Subject: Re: [Freecats-Dev] Underlying Unix base
Date: Fri, 9 May 2003 10:25:26 +0200

I'm having this discussion on several mailing lists at once at the moment. 
If/when I have time, I might summarize some of the issues and facts concerned 
for those who are interested.

Regarding this business of the "underlying Unix base" of OOo:

OOo is written in C (or C++). The source code is available.

OOo's file format (regardless of the platform, i.e. Solaris, Windows, Linux, 
Mac) is essentially XML.

OOo's macro language is... a complicated subject. OOo has a macro language 
(Star Basic) based upon BASIC and similiar to but not the same as MS Word's. 
A lot of VB (or is it VBA? I get those confused, too) code could probably be 
ported quite easily IF someone had a knowledge of Star Basic. The main 
problem, I suspect, is not so much the differences between the languages as 
the fact that hardly anyone knows Star Basic.

HOWEVER, OOo is not limited to Star Basic as a macro language. It has an API 
which enables other programming languages to be used for macro programming. 
As Sun owns both Star Office and Java, not surprisingly Java is the language 
with which most of the work has been done so far.

What this means, coming back to Kirk's original question is this:

A TM application could handle an OOo file in a number of different ways.

On the lowest level, an OOo file can be regarded simply as tagged text. This 
is what OmegaT does: it identifies OOo paragraph tags and uses them as 
segment markers. It doesn't, strictly speaking, parse the XML document as 
such (at least, not as far as I'm aware); it treats an OOo text file in much 
the same way as it treats an HTML file.

Obviously, this solution can be implemented easily in any programming 
language. The TM application I began writing in tcl/tk worked on the same 
principle.

The next level up would be for the TM application to treat the file not as 
tagged text, but as what it really is, namely XML. Such a solution is 
initially more complex. However, many programming languages, including Java 
and tcl/tk, have inbuilt mechanisms for handling XML. A programmer would have 
to have knowledge of, or familiarize himself with, the mechanisms in the 
program concerned, but this would very quickly pay off: the horrors Yves 
describes with parsing/segmenting would be eliminated, because the document 
would be handled "logically". As a further benefit, the solution could also 
be applied extremely easily to any other XML format.

The two levels I've just described are suitable for a standalone 
implementation. The third level up is quite different. Rather than 
manipulating the OOo text file directly, it is manipulated by OOo. For this 
purpose, an external programming language accesses the functions of OOo 
through the OOo API. The programmer does not need to have any knowledge of 
the OOo format or of XML. The concept is very promising, but as yet, very few 
people out there appear to have experience of it. 

Note to Yves: although Sun has clearly taken the decision to make OOo/SO a 
clone of Word EXCEPT for the macro language, there is another word 
processor/office suite which aims to go the whole way, and support native 
Word macros. That product is Softmaker/Textmaker. (It is not free, not 
open-source and not XML, but very reasonably priced, at EUR 50 - in other 
words, like Wordfast.) The Windows version has been available for several 
years; the Linux version has only just been launched. It might be worth 
looking into whether Wordfast will run with it. The programmers are very 
upbeat about it and very approachable, so it is worth talking to them 
personally.

Marc




reply via email to

[Prev in Thread] Current Thread [Next in Thread]