pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] article cache size


From: Andreas Nastke
Subject: Re: [Pan-users] article cache size
Date: Wed, 01 Oct 2014 15:03:43 +0200
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

what is the reason for this requirement?

if it's really some sort of mass binary download it would
make sense to use pan just for looking at previews and for
grabbing the nzb-files to feed them to some standalone
nzb-reader.


Duncan schrieb:
Duncan posted on Tue, 30 Sep 2014 21:10:56 +0000 as excerpted:

bubba posted on Tue, 30 Sep 2014 14:20:51 -0400 as excerpted:

i need a large article cache size, like 200gb.  i can't seem to get
more than 16384 allocated in the pan settings.  i altered the
preferences.xml file:
  <int name='cache-size-megs' value='200000'/>

but it has no effect on the maximum allowed 'size of article cache (in
mib)' in pan -> edit -> edit preferences.  i have a  terabyte free on
that hard drive.

is the max size hard-coded or am i missing something blindingly
obvious?

[Y]ears ago I was the person who asked to bump the max cache size from
1 GiB -- I needed 4 GiB at the time and it was bumped to 20, which was
great.

Later it was bumped again and I had /thought/ that the last time it was
made effectively unlimited.  However, that may be incorrect, or it may
now be running up against the maximum size limit of the type of integer
used.

FWIW, I'm running 12 gig on a dedicated cache partition, here.

Is your pan 32-bit, or 64-bit?  I don't claim to be a coder myself, but
chances are I can make at least some sense of the code and see how it
works, possibly coming up with a patch for you... unless it /is/ running
into a maxint condition and changing that is more complex than a simple
type change.

OK, took a look at the code. Again I'm no coder so this is going to look a bit simplistic to them and I might be doing something stupid here like off-by-one on the bits or something, but I can read enough code to analyze and come up with patches occasionally, and the explanation might be interesting to others who don't read code even as well as I do, so let's see... This is for current live-git pan, the commit you see in my headers, since I'm posting with pan and that's included:

I did a search on "cache-size" and came up with three hits, all in the pan/gui subdir (line and filename):

985:prefs-ui.cc
975:pan.cc
1149:gui.cc

Relevant pair of lines in prefs-ui.cc:

    w = new_spin_button ("cache-size-megs", 10, 1024*16, prefs);
    l = gtk_label_new(_("Size of article cache (in MiB):"));


OK, this is pretty obviously setting up the preferences GUI, with a max of 1024*16 (MiB), thus 16 GiB. The spinbox in the GUI is confined to that max, *BUT*, that doesn't /necessarily/ mean pan won't honor a higher setting if you set it yourself. In fact, there's precedent for that in the number of connections allowed per server...

History: Pan is GNKSA compliant[1], and while parts of GNKSA are arguably dated, a couple years ago when it came up, the overwhelming feeling on the list appeared to be that it was worth keeping that 100%, because once we (of course really pan's devs, but the feeling was strong enough it gave them a clear signal where users wanted to be) let that slide in one area, where would we eventually end up? Pan would be in danger of losing everything that made pan /pan/.

The problem is that GNKSA specifies that a news client can allow only up to four connections per server, while today, paid news providers often allow 50-ish connections. While most such providers don't seriously limit per-connection speeds and four connections is very often more than enough to saturate a user's Internet link, some users wanted to set more.

The compromise pan has allowed for quite some time rests on the fact that GNKSA specifies how many connections (4 per server) a compliant client can allow a user to set, NOT that a client must limit to that number of connections if a user edits the config file directly. Thus, for many years now, I think since the C++ rewrite introduced as 0.90, while the GUI spinners limit the connections per server to four, pan has actually attempted to use whatever was set in the config file, thus letting the user set a full 50 connections for a server if they want, as long as they do it by directly editing the config file!

So it's quite possible that while pan only allows setting upto 16 GiB cache size in the GUI, it'll actually use more if a user sets it... provided the integer-type used doesn't overflow. But the above code just sets up the UI and says nothing about the integer type used to actually store the number. Let's see what the other hits have to say...

In pan.cc the relevant lines are:

  if (gui)
  {
    // load the preferences...
...
    // instantiate the backend...
    const int cache_megs = prefs.get_int ("cache-size-megs", 10);


OK, so we have (signed) int. That may be a problem as the spec says int is only required to hold 16-bits (tho some platforms may standardize on larger, 16-bit or 32-bit ints, for instance), signed-int (since it isn't uint, unsigned int) reserving one for the sign, thus 15 bits. 1024 is 2^10 so we have five bits to play with, but 0 is a number too and counts as positive so the range is one less on the positive side. 32*1024-1=32767

We're using four bits for that 16*1024 above. Upping that to five bits for 32*1024 may well be possible, but beyond that could get complicated for some archs at least.

Since a negative cache size doesn't make a lot of sense, uint would be an option, gaining us a bit, to 64 GiB, but that's still way under your desired 200 GiB.

Changing that to a long might be possible but gets a bit complex for my non-coder abilities (tho if I were determined enough I'd definitely experiment with it for my own patches).

Of course if your platform defines an int as 32-bit or 64-bit, then upping it to say 256*1024 shouldn't be a problem for that platform, and you could certainly apply the patch yourself, tho I don't know enough about the various platforms (both MS Windows and *ix) pan runs on to know if it's safe to stay at int everywhere or if a switch to long is required. But 32-bit should be plenty in any case, since we're dealing in MiB already and that'd take us (if my math is correct) to 4 EiB max, which I guess should be a /few/ years away, anyway! =:^)

Meanwhile, I'd suggest trying say 32*1024-1 (=32767) MiB, and see if pan actually uses that, first, regardless of what the GUI says. If my point above is correct, pan should take that even if the GUI says differently. Tho I'm not sure how pan actually manages cache -- if it lets it get a bit above that and then deletes back down to it, that might blow up, while if it gets to that and then deletes say a GiB to make room, it should be fine.

If that works, then try above that, say your 200 GiB. If your platform uses 32-bit or 64-bit ints, that'll be fine. If not, it won't.


Meanwhile, our last hit, in gui.cc:

void GUI :: prefs_dialog_destroyed (GtkWidget *)
{

  const Quark& group (_header_pane->get_group());
  if (!group.empty() && _prefs._rules_changed)
  {
    _prefs._rules_changed = !_prefs._rules_changed;
    _header_pane->rules(_prefs._rules_enabled);
  }
  _cache.set_max_megs(_prefs.get_int("cache-size-megs",10));

}

This appears to be where pan actually loads the setting when the prefs dialog is closed. Note that _cache.set_max_megs is OUTSIDE the if- conditional so applies WHENEVER the prefs dialog is closed. That means whether you've actually changed anything or not.

Which unless I'm mistaken means that if you set a larger cache manually, you'll have to be very careful NOT TO OPEN the prefs dialog, since the moment you close it, pan will reset to the max 16*1024, 16 GiB cache size, thus triggering a delete of anything above that in the cache! That could make for quite some frustration -- I know as I remember similar problems way back when I originally requested that bump from 1 GiB max, back in the day.

That concludes my initial code analysis. I don't know whether you even build from source and guess that if you do, either you can patch as well or you can at least figure out what to try and manually change from the discussion above, so I won't post patches ATM, anyway.

Meanwhile, filling in the alternatives I mentioned earlier...
The alternative would be modifying your download style a bit.  Pan's
assumptions about how people do downloading are obviously different than
yours and mine, and a huge cache isn't needed for its way.  But some of
use obviously use pan differently, and while it generally works, it's
not quite the easy fit it would be if those assumptions were different. More about that later when I have more time and/or have taken a look.

What I was referring to here is simply the fact that pan seems to assume that binary downloaders direct-download, that is, find what they want and tell pan to save attachments directly. Pan really doesn't use a lot of cache in this mode because it's deleting posts as fast as it's downloading them and saving the attachments, so its 10 MB default cache is generally enough.

OTOH, there's a very different type of binary group usage that I do, and with your request, that I guess you do as well. I tend to want to set pan up to download anything that looks interesting to cache and then go away for awhile, say to work or to sleep. When I get back or wake up several hours later, all those posts are cached locally and I can browse thru them as I like, basically instantly, sorting as I go, saving off attachments that I decide I really want to keep and then normally deleting the messages, deleting others without permanent saving if after looking at them I decide they're not worth keeping. Of course this requires a *MUCH* larger cache, generally large enough to contain the entire download session, since in the first stage it's all only downloaded to cache, where it must remain until I've gone thru it.

But, as attachment sizes and volume increases, there comes a point at which the pre-cache method doesn't work so well. For still images and even mp3s and low-ish resolution mpegs of a few minutes max, it still works reasonably well, and for that a cache size of say 16 Gig should be fine since that's more or less what you can sort thru after a single session anyway.

But if you're looking at a 200 GiB cache, I'm guessing you're doing rather larger attachments, ISO-images and/or half-hour minimum possibly HD-resolution TV programs and feature-length movies. Several hundred MiB files minimum, 4.7 GiB DVD images, possibly full 20-ish GiB Bluray images, and/or perhaps whole TV series at a time.

For this, pre-caching really doesn't work so well anyway, in part because pan doesn't direct-preview them as it does still-images. As such, the direct-save method becomes the only practical method, both because it doesn't require the huge cache, and because you have to save the files off to view them anyway.

So IMHO anyway, you might wish to reconsider your download method. At your volume the direct-save method is likely to be most practical in any case, and shouldn't require that huge cache.

One other possibility.  It's possible (at least on Linux, don't know
about MS Windows or Apple OSX) to run multiple different pan instances
at once.  If you are active in enough groups and they split by subject
well enough, you could do that.  There's a variable that can be set to
point pan at a directory other than its default, and you can set this
differently to have multiple different pan setups.  I do that here.  If
multiple pan setups each with a 16 gig cache would work...

The environmental var in question is PAN_HOME. Here I set it in a wrapper script to one of my three pan profiles, with one script for each profile: bin, text, test. Of course if you want and the groups are separate enough, you can make that say mp3s, dvds, tvprogs. Or whatever.

Specifically, for my text groups I set PAN_HOME to ~/pan/text, so it uses the settings there instead of in the default ~/.pan2. That lets me have different settings (including different cache sizes) for each of my profiles.

Then where appropriate, I use symlinks in each profile, pointing to files in ~/pan/globals for some files (like my shared scorefile), and for my binary profile, pointing my cache to a separate, dedicated partition, that's only used for cache for my pan binary profile.


Meanwhile, yet another possibility I didn't think of earlier...

You can setup a local news server such as leafnode. You'd then configure it to do the mass downloading from your NSP into its cache, and could then simply point pan at your local leafnode or whatever news server. Then you could probably leave pan's cache at the default 10 MiB size and/ or even place it in tmpfs (a RAM-based filesystem), since your server with its own cache of whatever size would be local anyway.

So here's hoping you find at least some of that helpful... =:^)

---
[1] GNKSA: Good Net-Keeping Seal of Approval. While even its keepers acknowledge it's a bit outdated today, back in the day it served as a widely accepted guideline for news-client acceptable net behavior. See the pan website for details about pan's compliance and a link, and the list archives for previous discussions about pan's compliance here.


--
Mit besten Grüßen / Kind Regards

Andreas Nastke
IT System Management

g/d/p Markt- und Sozialforschung GmbH
Ein Unternehmen der Forschungsgruppe g/d/p
Richardstr. 18
D-22081 Hamburg
Fon: +49 (0)40 / 29876-117
Fax: +49 (0)40 / 29876-127
address@hidden
www.gdp-group.com

Sitz der Gesellschaft ist Hamburg, Handelsregister Hamburg, HRB 40482
Geschäftsführer: Christa Braaß, Volker Rohweder

-----------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information.  If
you are not the intended recipient please notify the sender and  delete
this e-mail from your whole system. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
-----------------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]