pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: performance ...


From: wim . delvaux
Subject: Re: [Pan-users] Re: performance ...
Date: Tue, 9 Dec 2008 13:41:28 +0100
User-agent: KMail/1.9.10

> There was a bug with Ubuntu (Hardy I think but I'm not a *buntu person
> and IDR for sure) where pan on *GNOME* (NOT KDE, and NOT XFCE) would
> become non-responsive for half an hour or more at times.  However, the
> same folks that saw that said it went away when they switched to KDE or
> XFCE (thus the NOTs above).  I guessed it was a shared library issue but
> it was never traced to one, only to GNOME.

I run kubuntu, sorry.

>
> But what you're seeing is normal.  Keep in mind that if pan is saying a
> million articles, that's after combining multiparts.  In some groups,
> that could mean ten or fifty million actual single-part articles.

I was referring to the total article count (not the thread count) My largest 
groups file is 900 MB.

>
> Do you have pan set to download new overviews/headers when you enter the
> group?  It saves the current set including threading when you exit a
> group, and only has to load that from disk, but when you pull down new
> overviews/headers (including when you enter a group if you have pan set
> to do it then), it must process all the new ones that come in, figuring
> out where they plugin to the existing set, combining multiparts, etc.

yes i have but generally I do a CTRL-A , A on the groups pan do download the 
headers while not showing them (just the article count)

Perhaps using memory maps might speed up things ? Also the data seems to be 
writting in ASCII format, requiring rescan/repars every time.  Perhaps saving 
in binary, which allows even more efficient use of memory maps might be 
usefull (Option only for large groups perhaps ? ) It might not reduce the 
size of the file but it will avoid having to convert lots of integers (like 
line numbers, sizes, dates etc).  Also it would allow to read in blocks 
without having to process those blocks.

Ascii is fine if you have only but a few data items.

> It may be just strings, but you try working with a few million times say
> a kilobyte of data in headers each (a million times 1 KB in headers each,
> that's a gig right there!), and tell me when you're done that you still
> can't believe it takes pan a half a minute and a gig of memory to process
> it all!

that is true but when you know you might need to treat 1G of data, you start 
managing the data cleaverly.  Generally you try to save the work that you did 
for later purposes.  E.g. if you have already figured out certain things, you 
store that info so that you don't have to figure it out later on.

>
> But if you code and are good with database type stuff, and can make it
> more efficient, I'm sure Charles would like to see it.  I know there were
> 2-3 database coder guys that experimented with various enhancements with
> old-pan, with the results reflected in the changes made to handing in new-
> pan, but if you believe it's possible to do better, please see what you
> can do, and if it's actually better in practice, by all means, file a bug
> with the patches and let Charles know.  It's not like any of us are going
> to complain if it's made faster or less memory intensive! =:^)
Will have a look into it...
>
> Meanwhile, how do you monitor CPU usage?  Are you monitoring it per core,
> or overall only?  Most of new-pan is single-threaded, because Charles had
> gone with multi-threaded in old-pan and found the complexity and thread-
> race bugs just not worth it for the limited increase in performance.
> Instead, new-pan now hatches threads only in limited performance critical
> sections (like when starting multiple connections at once, one place I
> know it's used as I remember Charles fixing a bug I had with it).  So pan
> will likely be using near 100% of a single core, but the others should
> remain mostly idle, I /think/.  (It has been awhile since I did binaries
> and IDR for sure.)

No when it is busy doing stuff and blocking other apps from doing something I 
ran top and it showed pan using about 80% cpu, constantly for a certain time.

>
> Also, it may be disk I/O related, if you have a single disk only and that
> group's data isn't in cache yet.  I run a dual dual-core Opteron 290 (2.8
> GHz) here, so have four cores too, but I'm running Gentoo/~amd64 with
> everything compiled to my specific hardware, which will help some (BTW,
> you didn't mention whether you were running 32-bit or 64-bit kubuntu, 4
> gigs on 32-bit is going to be less efficient than 4 gigs on 64-bit), and
> I run a 4-disk kernel/md RAID, with pan's data on RAID-6, which means
> it's two-way striped.  RAID striping really /does/ help, and not just
> with pan; you might be surprised how much.

Yes i have been considering switching since

1. my 4 GB is not used (because of memory of graphics card)
2. indeed my disk seems to be the bottleneck.

However I need to completely upgrade my box and that is a hard job.  Also I 
have no experience setting up RAID (donno even if my mobo supports it)

Thx for the reply
W





reply via email to

[Prev in Thread] Current Thread [Next in Thread]