pan-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-devel] Re: Move to database backend


From: Charles Kerr
Subject: [Pan-devel] Re: Move to database backend
Date: Wed, 10 Mar 2004 09:37:31 -0800
User-agent: Mutt/1.5.4i

On Mon, Mar 08, 2004 at 01:15:36PM -0500, Tom wrote:
> Hi,
> 
> I have seen occasional mentions on the pan-users list about plans to
> move to a database back-end. I have some database experience, so I
> thought I'd find out what the status of that work is, and see if I could
> be of help. 
> 
> Basically I thought of doing the following, at last as a start:
> Write a conversion routine/utility which would read the current files
> and load the database. I've been looking at the CVS tree, specifically
> pan/base/file_headers.c and related files.
> 
> Questions:
> 1. First, where is the right place to raise these questions? I sent a
> couple of messages to the pan-devel list, (address@hidden), but I
> didn't get copies, nor do they show up in the archives. The last message
> in the archives is from December, 2003. 
> 2. How much of this work is already done?
> 3. What db will be used? I think Charles mentioned that he was planning
> on using Berkeley DB. I see this isn't a SQL database. At first glance,
> if you want the records in a different order, you need to set up an
> index corresponding to the order you want, and then retrieve using that
> index. 
> 4. Will the database be just for the header information, with the
> article bodies stored as they are now?

Hi Tom!

Now is as good a time as any to do the database backend -- Pan's
fairly stable right now, and the code isn't changing much.

pan-devel is the better place to have this conversation, so that
anyone else can chime in with suggestions, and since gnu.org does
pan-devel backups.  It looks like your letter made it through:
http://mail.gnu.org/archive/html/pan-devel/2004-03/msg00000.html

I've written down about eight pages in my notebook on what I think
would be good.  In general I'm trying to keep as much information in
the database, rather than in memory.  Ideally the Article struct
would go away.  article-thread is already redundant because Pan
threads an article when it's downloaded instead of each time the
group is loaded.[1]

Right now I'm using SQLite, as it's fast, embeddable and portable.[2]

So far I've got the headers being inserted into the database as
they are downloaded, and multiparts and plaintext articles are
all threaded inside of the database so that we don't have to
rethread the entire group every time we load Pan.  This is running
parallel to file-headers, though, since I don't have articlelist
reading out of the database yet.  But I've got some ideas on how to
do that.

My big concern right now is speed.  Hard Drive speed seems to be crucial
for keeping the headers in a database -- when I download headers right
now, disk access is the bottleneck rather than bandwidth.  So an
experienced DB person's opinion on how to tune the tables would be
great.

I'll transcribe my notes tonight or tomorrow, and mail them and my
code changes to pan-devel.

-- 
cheers,
Charles

[1] To be pedantic, the article does have to be rethreaded when its
    parent article expires, but you get the idea.

[2] Some people have talked about being able to plug in other databases,
    which would be nice too.  I know the gnome-db wrapper API works for
    SQLite, postgresql, MySQL, oracle, etc.  I don't know if gnome-db's
    prerequisites would make it prohibitive for Pan's Windows port.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]