swarm-support
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: swarm and database


From: Benedikt Stefansson
Subject: Re: swarm and database
Date: Tue, 14 Dec 1999 15:06:46 -0700

Alex Lancaster wrote:

> >>>>> "PJ" == Paul E Johnson <address@hidden> writes:
>
> PJ> I thought PostGreSQL was the free and open alternative.  Is there
> PJ> a reason you don't mention it?
>

<snip>

>
> I don't pretend to be a database expert (caveat reader), but I have
> used a number of commercial databases in the past (when I worked on
> non-free software for internal corporate systems) such as Oracle and
> Sybase.  From all accounts PostGresSQL is pretty good (I think
> Benedikt is/was using it at some point).

I've been writing a large application in Swarm which relies heavily on object
serialization with a database backend, thought I'd share some of thoughts about
why I did not end up writing an interface between the DB and Swarm.

Basic problem I needed to solve was this:

Data for object serialization in DB would reside in DB, distributed over a 
number
of tables. Seperate (Swarm) applications needed to read/write and exchange
serialization data. Typical scenario is that I must be able to merge output of
one application with additional data from database before creating serialization
data to be read into another application.

For DB I decided on PostgreSQL. *)

My first inclination was to write an Objective-C wrapper around the  libpg
library, which is a C library interface to PostgreSQL (and obviously open source
like all of this stuff.) However I ran into a silly problem which prevented me
from doing that: the symbol "Index" is used heavily in the library and it
conflicts with the "Index" class in the collections library in Swarm.

Before I shot a couple of days getting around this problem in porting libpg I
decided that the benefit-cost ratio was to low.

The problem I was facing went deeper, i.e. I first needed to figure out a way to
map cleanly between the record/field worldview of relational databases and the
world of objects.  I decide that the level of information that would have to be
built into the Swarm application about the structure of the database would be so
high that it would be simpler to prepare data before a run, and rely on existing
methods to read serialization data into the Swarm app.

So in practice I ended up writing 'middleware' (using Perl and Java) that sucks
data out of the DB and writes serialization files and takes output from the 
Swarm
apps, and writes it back into SQL.

The flow of information is DB (via Perl script) -> into XML -> (via
XSL parser) into scm format -> (via LispArchiver) into Swarm -> (via own hack of
Archiver) into XML -> (via Perl script) into DB or (via XSL parser) into .scm
format.

The reason I'm using XML/XSL in all of this is that I wanted this to be as
portable as possible, it also allows me to reuse code for different purposes
(object serialization, web interface etc).

In all of this I also decided against using HDF5 since it was harder to move 
data
between HDF5 and the database going both ways. Reason I use the XML<->.scm is
that I like having data in humanly readable format (for debugging among other
things).

So, getting back to the original thread of this discussion, one needs to ask
whether there is a serious advantage in making Swarm apps capable of making
direct callbacks to a DB. There are performance issues, compatability issues and
a portability issues. Most of all the transition of data from a relational
database to an object oriented system is messy; with new object oriented
databases becoming available (none for Linux though I fear) this may be becoming
easier to handle.


Regards,
Benedikt

*) While MySQL and mSQL are probably equally or more popular, PostgreSQL 
supports
a larger subset of the SQL language, and it has some serious backing from
research types. There is a cult around MySQL and mSQL because they are used in 
so
many (Linux) web applications, and an O'Reilly book just came out about these.
I think it is pretty much a matter of personal taste which one you go with.

PostgresSQL has a _very_ nice Tk client called pgaccess which is a real life
saver and a command line client called psql which is nicely integrated with
readline etc. There is also an ODBC driver, which allows you to send data over
TCP/IP from Windows clients into PostgresSQL (e.g. export data from Access to
your Linux box).

PS. The LispArchiver and HDF5Archiver are both incredibly useful tools, anyone
still using ObjectLoader and ObjectSaver should look into moving to the new
formats. My next project involves developing stuff for a Beowolf cluster and I
expect to make use of serialization/deserialization for some poor man's
paralellization there. More on that later

--
Benedikt Stefansson      | address@hidden
CASA, Inc.               | Ph : (505) 988-8807 x101
Santa Fe, NM 87501       | Fax: (505) 988-3440




                  ==================================
   Swarm-Support is for discussion of the technical details of the day
   to day usage of Swarm.  For list administration needs (esp.
   [un]subscribing), please send a message to <address@hidden>
   with "help" in the body of the message.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]