pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] pan for Windows crashes when reading large newsgroup


From: Duncan
Subject: Re: [Pan-users] pan for Windows crashes when reading large newsgroup
Date: Fri, 19 Oct 2012 14:26:56 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT f91bd24 /usr/src/portage/src/egit-src/pan2)

K Shen posted on Fri, 19 Oct 2012 07:08:57 +0100 as excerpted:

> Hi,

Hello. =:^)

Before we get into the message, let me remind you to please turn off the 
HTML.  Being a pan user you probably already know how annoying it can be, 
seeing that in pan... which many here use for this list, via gmane.org's 
list2news service.

> I am using pan newsreader for Windows to read news for several years
> now, but in the past month or so, I  have started to see regular crashes
> of pan when reading a newgroup with a large number of articles.[...] 
> without such problems previously [...] traffic [may] have increased
> [...]
> 
> fault module name is libcairo-2.dll. After a few crashes, I have
> noticed that the crash happens when the memory used by pan.exe is
> around 1,800,000KB. [...]
> 
> I have just had another crash, while reading in the headers. [T]he
> Commit memory for pan.exe was 1,896,692KB.
> 
> I have been using a 32 bit x86 Windows XP laptop with 2G of real memory
> up to 3-4 months ago, which was replaced by a 64 bit x86-64 Windows 7
> laptop with 4G of real memory. This was about 1-2 months before I
> noticed the crash problem, and I don't know if this new configuration is
> important for the crashes (I have not seen the crashes on the old
> laptop).
> 
> Does anyone know if the crash is caused by the amount of memory
> used/number of headers? Is there any known reason why the crash seem to
> happen when the memory used by the process is around 1.8-1.9G?

Short version: You're very likely running into the infamous 32-bit memory 
limits that are the reason the computing world is moving to 64-bit.

Rather longer version:  In general, the single-byte-addressable flat-
address-space limit of a 32-bit system is 4 GB.  However, this is the 
total of the "virtual" address space, which must be split between several 
different uses, primarily between user-space and kernel-space, with the 
most common split being 2:2 user/kernel, two gigs each, user low, kernel 
high.[1]

AFAIK, many MS 32-bit consumer/home/pro kernels have a hard 2G/2G split 
(tho the server editions generally use PAE[2] mode as Linux does, and 
thus have a far higher limit, even for 32-bit).  That's also the default 
split on Linux 32-bit kernels, but of course it's source available and 
can be rebuilt using one of the other available options.  These include a 
3G/1G user/kernel split option, a 4G/4G option that actually dedicates a 
separate 4-gigs to each and switches between them every time it switches 
user-mode/kernel-mode (lower efficiency, but if your 32-bit app needs >3 
gigs...), and the 64-gig max PAE mode[2], also less efficient due to the 
additional layer of indirection it uses.

Switching to a 64-bit kernel does allow the /kernel/ to natively access 
memory above the 4 GB barrier, but if you're running 32-bit apps, they're 
still limited to their old sub-4-gig, and possibly sub-2-gig, size.  I 
don't know enough about Windows to know how it manages user-space 
limits.  In theory, I /believe/ 32-bit apps should have access to a full 
4 gigs of virtual userspace (they do on Linux when running on a 64-bit 
kernel), but it's very likely that there remains either an MS kernel 
enforced 2-gig barrier, or the default compiler options used when 
building an app make that assumption, maybe both.

Getting back to pan, on large groups with many millions of headers, pan 
does unfortunately use gigs of memory, because at present, it builds a 
tree of all that header and threading data in memory.  This is actually 
rather better than it used to do... I remember when pan would run into 
trouble at 100k-200k headers!  One of the things done to help manage 
memory usage since then, is that now it does string-combining for 
repeated strings such as author and subject, keeping only one copy of an 
author name string in memory and reducing the others to references to the 
first, for instance, and keeping only one copy of the subject line for 
multi-part posts, which it auto-combines and displays as a single entry.

For many years (since well before Charles left), there has been talk of 
switching to a database backend of some type, perhaps sqlite-based, to 
track all this data, so only a relatively small bit of it would need to 
be in memory at once.  However, Charles left as lead dev before it was 
ever implemented. I suspect he wasn't familiar with coding for databases 
and they're notoriously hard to get correct for the unexperienced, with 
crash and data-loss bugs being extremely common, so he was hesitant.

Then pan was basically abandoned code for a couple years, then adopted by 
someone who could maintain it but didn't have the time to really add new 
features, and only recently (a year or so ago) has Heinrich Mueller come 
along, with all the new features he has implemented at such a furious 
pace!

And he's working on the disk-backed database backend, but as I said, 
databases are notoriously HARD to get right the first time, so even when 
he does have something out to test, it's quite likely it'll be some time 
before that code is actually reasonably stable.

Meanwhile, you appear to still be running a 32-bit pan on your 64-bit MS 
kernel, once pan hits 1.8-1.9 gigs, along with various other overhead 
that pushes it over the 2-gig barrier into what would often be kernel 
space on a 32-bit system and is apparently still reserved as kernel space 
unavailable to your 32-bit pan, on your now 64-bit system.

Actually, for the biggest groups on servers with a high retention 
(giganews is known for this, some of the others have the problem too on 
they heaviest traffic groups), even an 64-bit 8-gig system can run into 
problems trying to get and process ALL headers.  Someone calculated what 
it would take to handle them and posted the results at one point, and 
IIRC, it was something over 16 gigs, 17-ish, I think.  FWIW I have 16 gig 
now (tho I haven't done binary groups in years), so it'd push even my 
system into swap some.

So you're kind of between a rock and a hard place.  Until Heinrich comes 
out with that database backend I've seen him mention a few times, your 
options include switching to a 64-bit pan, continuing with the N-days 
header thing, or trying something else that HAS implemented a database 
backend.  It's /possible/ there's some options you can tweek to let you 
access a full 4 gigs with a 32-bit pan, but that's ultimately likely to 
run into the same issues as well.  You REALLY either the still being 
coded pan database backend (Heinrich would have to tell you its status, 
he could be barely started, or just about ready to pop the announcement, 
I simply don't know), or a 64-bit pan and likely 8 or 16 gigs RAM, or to 
find another news harvesting alternative other than pan that already has 
such a database backend.  It's really that simple.

Of course since you didn't post the server and group name (not that I 
blame you, the group name can be... rather private info to be posting), 
it's also possible that it's not that big after all, and that you're 
running into some other problem.  However, pan /is/ known to have this 
problem especially on 32-bit, and that close to the 2-gig barrier on a 
group you did say was heavy traffic, chances are it really /is/ the 
memory barrier you're hitting.  Unfortunately...

---
[1] It's not relevant here but complicating matters further is the fact 
that the top of the 32-bit space, often the half-gig to gig, with high-
graphics-memory machines it can near two-gigs, is reserved for legacy 32-
bit PCI device hardware I/O address usage, even on 64-bit machines.  For 
machines with 3+ gigs of physical RAM, this presents a problem as the PCI 
hardware I/O area masks any physical memory located at these reserved 
addresses.  The solution is to remap this otherwise hidden physical 
memory above the 4GB barrier, but for a number of years many BIOSes 
didn't come with this option, and people with these machines who upgraded 
to 4 GB simply wasted between a quarter gig and a full gig of RAM, as it 
was hidden behind the PCI hardware IO area and thus unusable.  That's 
also why say an 8-gig physical-ram machine will often count up to 9 or 10 
gigs in its POST (power-on-self-test) -- it's remapping up to two gigs up 
above the PCI hardware IO memory hole.

http://en.wikipedia.org/wiki/3_GB_barrier

[2] PAE, Physical Address Extension:
http://en.wikipedia.org/wiki/Physical_Address_Extension

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]