pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: performance ...


From: Duncan
Subject: [Pan-users] Re: performance ...
Date: Tue, 9 Dec 2008 06:29:45 +0000 (UTC)
User-agent: Pan/0.133 (House of Butterflies)

address@hidden posted
address@hidden, excerpted below, on 
Tue, 09 Dec 2008 04:27:04 +0100:

> I have a quad core with 4 GB of memory running kubunty hardy.
> 
> I do follow a large set of newsgroups with lots of articles.
> 
> it seems to me that PAN could use some performance tuning.  I cannot
> believe that it take so long to sort some 1000000 articles.  It are just
> strings ?
> 
> Sometimes, while switching between groups, Pan eats about 80% CPU for
> 15-30 seconds, effectively eating up my PC. Also it used about 1GB of
> non-shared memory.

There was a bug with Ubuntu (Hardy I think but I'm not a *buntu person 
and IDR for sure) where pan on *GNOME* (NOT KDE, and NOT XFCE) would 
become non-responsive for half an hour or more at times.  However, the 
same folks that saw that said it went away when they switched to KDE or 
XFCE (thus the NOTs above).  I guessed it was a shared library issue but 
it was never traced to one, only to GNOME.

But what you're seeing is normal.  Keep in mind that if pan is saying a 
million articles, that's after combining multiparts.  In some groups, 
that could mean ten or fifty million actual single-part articles.

Do you have pan set to download new overviews/headers when you enter the 
group?  It saves the current set including threading when you exit a 
group, and only has to load that from disk, but when you pull down new 
overviews/headers (including when you enter a group if you have pan set 
to do it then), it must process all the new ones that come in, figuring 
out where they plugin to the existing set, combining multiparts, etc.

Also, as Jim mentions, old-pan was /seriously/ scale challenged, and 
would start having trouble at 100k individual overviews/headers (it 
didn't combine them like new-pan does).  A couple million... even if you 
had gigs and gigs of RAM, would sit and churn for an hour or more, and 
forget anything above that.  It just didn't scale well at all, and a 
couple million overviews was seriously pushing it, period.  New-pan (what 
*buntu ships) had a quite a lot of serious work go into it to improve 
memory use and scaling, and it actually does quite well now, in general 
scaling linearly or better.  With a reasonable amount of memory it'll 
handle 10 or 20 million overviews without issue, tho processing that many 
overviews does take time/memory/cpu, no way around it.  As I said, if 
you're working in a multipart group (say mp3s) and pan says a million 
headers unread, that's likely to be a good ten million individual message 
parts, more on movie and iso groups, less on jpeg groups.

One of the things pan now does to save memory is track strings and 
combine where possible.  This is why it displays multiparts as a single 
part, thus being able to track the subject and author only once for the 
multipart.  However, it does more than that.  If you look at pan's data 
files, it counts the number of times an author's name occurs, for 
instance, and for regulars will store it in memory only once, using a 
much shorter reference the additional times.  All this sort of stuff it 
sorts out and plugs into its database system as it's downloading the 
overviews.  Then when you leave the group, it saves it and pulls the next 
group's data off of disk.

It may be just strings, but you try working with a few million times say 
a kilobyte of data in headers each (a million times 1 KB in headers each, 
that's a gig right there!), and tell me when you're done that you still 
can't believe it takes pan a half a minute and a gig of memory to process 
it all!

But if you code and are good with database type stuff, and can make it 
more efficient, I'm sure Charles would like to see it.  I know there were 
2-3 database coder guys that experimented with various enhancements with 
old-pan, with the results reflected in the changes made to handing in new-
pan, but if you believe it's possible to do better, please see what you 
can do, and if it's actually better in practice, by all means, file a bug 
with the patches and let Charles know.  It's not like any of us are going 
to complain if it's made faster or less memory intensive! =:^)

Meanwhile, how do you monitor CPU usage?  Are you monitoring it per core, 
or overall only?  Most of new-pan is single-threaded, because Charles had 
gone with multi-threaded in old-pan and found the complexity and thread-
race bugs just not worth it for the limited increase in performance.  
Instead, new-pan now hatches threads only in limited performance critical 
sections (like when starting multiple connections at once, one place I 
know it's used as I remember Charles fixing a bug I had with it).  So pan 
will likely be using near 100% of a single core, but the others should 
remain mostly idle, I /think/.  (It has been awhile since I did binaries 
and IDR for sure.)

Also, it may be disk I/O related, if you have a single disk only and that 
group's data isn't in cache yet.  I run a dual dual-core Opteron 290 (2.8 
GHz) here, so have four cores too, but I'm running Gentoo/~amd64 with 
everything compiled to my specific hardware, which will help some (BTW, 
you didn't mention whether you were running 32-bit or 64-bit kubuntu, 4 
gigs on 32-bit is going to be less efficient than 4 gigs on 64-bit), and 
I run a 4-disk kernel/md RAID, with pan's data on RAID-6, which means 
it's two-way striped.  RAID striping really /does/ help, and not just 
with pan; you might be surprised how much.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]