pan-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-devel] musings on memory consumption


From: Anatoly Vorobey
Subject: [Pan-devel] musings on memory consumption
Date: Thu, 17 Jun 2004 17:28:31 +0300
User-agent: Mutt/1.5.4i

Judging by recent discussions on this mailing list, developers of Pan 
have more or less decided to move to a DB backend. I've been working 
towards a different goal for the last few days, trying to make Pan work 
for me with very large groups in its current model of storing article 
headers in memory. This wasn't motivated by any ideological opposition 
to DB backends in general; I merely wanted to be able to use Pan for all 
my Usenet needs as soon as possible. Pan is the only GUI newsreader I 
can use w/o yearning for a return to slrn every minute or so.

I ended up with a patch that allows me to browse a 1-million-headers 
newsgroup comfortably on my machine, which is more or less what I 
needed. Basically, I use refcounted strings and normalised subjects. 
There's a new string type, RString, which stores unique strings only 
once by using a global hash table and a refcount field to keep track of 
how many times the string was referenced, allowing it to be freed when 
the refcount drops to 0. RStrings can be used for many strings inside 
Article which are now stored as PStrings separately for each article - 
for example, author's name, author's email address, newsgroup names in 
xref headers, etc. I may convert all of these to RStrings sometime later 
to further reduce memory use. However, the biggest memory hog is the 
subject. I wrote up a separate Subject type which is a kind of 
normalised subject - it strips the "Re: " part at the beginning and the 
part number, if those are present, stores them separately, and then 
stores the rest as an RString, which means, in particular, that all 
parts of a multipart article end up referencing the same subject 
RString. Additionally, all of article-thread.c needed to be rewritten 
(its normalisation of subjects when sorting or threading is no longer 
needed, and in general it became smaller, faster, and much less 
RAM-hungry), and all places 
in Pan which reference article subjects needed small adjustment.

After this, starting Pan and loading a 1 million headers newsgroup 
results in about 340Mb memory used, which is tolerable for me. The 
slowest reactions now come from the GTK header pane, presumably because
it finds it hard to cope with such a large tree. The widget spends 
around 10 seconds initialising or freeing the entire header pane (when 
entering/leaving the group). I'm not sure whether dynamic feeding of 
data into the widget, along the lines Evan Martin suggested in a recent 
message, would speed that up.

A small nuisance is that Pan doesn't seem to be freeing the articles 
when I leave the group and enter another one, even though, when I 
re-enter the original group, it loads them from disk anyway. Why does it 
behave this way? If there's no compelling reason for this, I may 
spend some more time on improving this (perhaps as simple as an "Unload 
this group" action in the context menu of a grouplist?) 

Anyway, I understand that the work I've done might not be useful to 
official Pan development if it's been decided to focus on moving to a DB 
backend. I did it primarily for myself, to scratch my own private itch. 
If, however, there's interest from project maintainers, and they think 
it might be considered for inclusion, or if other people want to try it 
out, do let me know; I can find some time to clean it up, remove 
debugging junk, test it some more and put it up for download.

With best wishes,
Anatoly.

-- 
avva




reply via email to

[Prev in Thread] Current Thread [Next in Thread]