Re: [Pan-users] article cache size

pan-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] article cache size

From:	Andreas Nastke
Subject:	Re: [Pan-users] article cache size
Date:	Wed, 01 Oct 2014 15:03:43 +0200
User-agent:	Thunderbird 2.0.0.24 (Windows/20100228)

what is the reason for this requirement?

if it's really some sort of mass binary download it would
make sense to use pan just for looking at previews and for
grabbing the nzb-files to feed them to some standalone
nzb-reader.


Duncan schrieb:

Duncan posted on Tue, 30 Sep 2014 21:10:56 +0000 as excerpted:
bubba posted on Tue, 30 Sep 2014 14:20:51 -0400 as excerpted:
i need a large article cache size, like 200gb.  i can't seem to get
more than 16384 allocated in the pan settings.  i altered the
preferences.xml file:
  <int name='cache-size-megs' value='200000'/>

but it has no effect on the maximum allowed 'size of article cache (in
mib)' in pan -> edit -> edit preferences.  i have a  terabyte free on
that hard drive.

is the max size hard-coded or am i missing something blindingly
obvious?
[Y]ears ago I was the person who asked to bump the max cache size from
1 GiB -- I needed 4 GiB at the time and it was bumped to 20, which was
great.

Later it was bumped again and I had /thought/ that the last time it was
made effectively unlimited.  However, that may be incorrect, or it may
now be running up against the maximum size limit of the type of integer
used.

FWIW, I'm running 12 gig on a dedicated cache partition, here.

Is your pan 32-bit, or 64-bit?  I don't claim to be a coder myself, but
chances are I can make at least some sense of the code and see how it
works, possibly coming up with a patch for you... unless it /is/ running
into a maxint condition and changing that is more complex than a simple
type change.
OK, took a look at the code. Again I'm no coder so this is going to looka bit simplistic to them and I might be doing something stupid here likeoff-by-one on the bits or something, but I can read enough code toanalyze and come up with patches occasionally, and the explanation mightbe interesting to others who don't read code even as well as I do, solet's see... This is for current live-git pan, the commit you see in myheaders, since I'm posting with pan and that's included:
I did a search on "cache-size" and came up with three hits, all in thepan/gui subdir (line and filename):
985:prefs-ui.cc
975:pan.cc
1149:gui.cc

Relevant pair of lines in prefs-ui.cc:

    w = new_spin_button ("cache-size-megs", 10, 1024*16, prefs);
    l = gtk_label_new(_("Size of article cache (in MiB):"));
OK, this is pretty obviously setting up the preferences GUI, with a maxof 1024*16 (MiB), thus 16 GiB. The spinbox in the GUI is confined tothat max, *BUT*, that doesn't /necessarily/ mean pan won't honor a highersetting if you set it yourself. In fact, there's precedent for that inthe number of connections allowed per server...
History: Pan is GNKSA compliant[1], and while parts of GNKSA are arguablydated, a couple years ago when it came up, the overwhelming feeling onthe list appeared to be that it was worth keeping that 100%, because oncewe (of course really pan's devs, but the feeling was strong enough itgave them a clear signal where users wanted to be) let that slide in onearea, where would we eventually end up? Pan would be in danger of losingeverything that made pan /pan/.
The problem is that GNKSA specifies that a news client can allow only upto four connections per server, while today, paid news providers oftenallow 50-ish connections. While most such providers don't seriouslylimit per-connection speeds and four connections is very often more thanenough to saturate a user's Internet link, some users wanted to set more.
The compromise pan has allowed for quite some time rests on the fact thatGNKSA specifies how many connections (4 per server) a compliant clientcan allow a user to set, NOT that a client must limit to that number ofconnections if a user edits the config file directly. Thus, for manyyears now, I think since the C++ rewrite introduced as 0.90, while theGUI spinners limit the connections per server to four, pan has actuallyattempted to use whatever was set in the config file, thus letting theuser set a full 50 connections for a server if they want, as long as theydo it by directly editing the config file!
So it's quite possible that while pan only allows setting upto 16 GiBcache size in the GUI, it'll actually use more if a user sets it...provided the integer-type used doesn't overflow. But the above code justsets up the UI and says nothing about the integer type used to actuallystore the number. Let's see what the other hits have to say...
In pan.cc the relevant lines are:

  if (gui)
  {
    // load the preferences...
...
    // instantiate the backend...
    const int cache_megs = prefs.get_int ("cache-size-megs", 10);
OK, so we have (signed) int. That may be a problem as the spec says intis only required to hold 16-bits (tho some platforms may standardize onlarger, 16-bit or 32-bit ints, for instance), signed-int (since it isn'tuint, unsigned int) reserving one for the sign, thus 15 bits. 1024 is2^10 so we have five bits to play with, but 0 is a number too and countsas positive so the range is one less on the positive side.32*1024-1=32767
We're using four bits for that 16*1024 above. Upping that to five bitsfor 32*1024 may well be possible, but beyond that could get complicatedfor some archs at least.
Since a negative cache size doesn't make a lot of sense, uint would be anoption, gaining us a bit, to 64 GiB, but that's still way under yourdesired 200 GiB.
Changing that to a long might be possible but gets a bit complex for mynon-coder abilities (tho if I were determined enough I'd definitelyexperiment with it for my own patches).
Of course if your platform defines an int as 32-bit or 64-bit, thenupping it to say 256*1024 shouldn't be a problem for that platform, andyou could certainly apply the patch yourself, tho I don't know enoughabout the various platforms (both MS Windows and *ix) pan runs on to knowif it's safe to stay at int everywhere or if a switch to long isrequired. But 32-bit should be plenty in any case, since we're dealingin MiB already and that'd take us (if my math is correct) to 4 EiB max,which I guess should be a /few/ years away, anyway! =:^)
Meanwhile, I'd suggest trying say 32*1024-1 (=32767) MiB, and see if panactually uses that, first, regardless of what the GUI says. If my pointabove is correct, pan should take that even if the GUI says differently.Tho I'm not sure how pan actually manages cache -- if it lets it get abit above that and then deletes back down to it, that might blow up,while if it gets to that and then deletes say a GiB to make room, itshould be fine.
If that works, then try above that, say your 200 GiB. If your platformuses 32-bit or 64-bit ints, that'll be fine. If not, it won't.
Meanwhile, our last hit, in gui.cc:

void GUI :: prefs_dialog_destroyed (GtkWidget *)
{

  const Quark& group (_header_pane->get_group());
  if (!group.empty() && _prefs._rules_changed)
  {
    _prefs._rules_changed = !_prefs._rules_changed;
    _header_pane->rules(_prefs._rules_enabled);
  }
  _cache.set_max_megs(_prefs.get_int("cache-size-megs",10));

}
This appears to be where pan actually loads the setting when the prefsdialog is closed. Note that _cache.set_max_megs is OUTSIDE the if-conditional so applies WHENEVER the prefs dialog is closed. That meanswhether you've actually changed anything or not.
Which unless I'm mistaken means that if you set a larger cache manually,you'll have to be very careful NOT TO OPEN the prefs dialog, since themoment you close it, pan will reset to the max 16*1024, 16 GiB cachesize, thus triggering a delete of anything above that in the cache! Thatcould make for quite some frustration -- I know as I remember similarproblems way back when I originally requested that bump from 1 GiB max,back in the day.
That concludes my initial code analysis. I don't know whether you evenbuild from source and guess that if you do, either you can patch as wellor you can at least figure out what to try and manually change from thediscussion above, so I won't post patches ATM, anyway.
Meanwhile, filling in the alternatives I mentioned earlier...
The alternative would be modifying your download style a bit.  Pan's
assumptions about how people do downloading are obviously different than
yours and mine, and a huge cache isn't needed for its way.  But some of
use obviously use pan differently, and while it generally works, it's
not quite the easy fit it would be if those assumptions were different.More about that later when I have more time and/or have taken a look.
What I was referring to here is simply the fact that pan seems to assumethat binary downloaders direct-download, that is, find what they want andtell pan to save attachments directly. Pan really doesn't use a lot ofcache in this mode because it's deleting posts as fast as it'sdownloading them and saving the attachments, so its 10 MB default cacheis generally enough.
OTOH, there's a very different type of binary group usage that I do, andwith your request, that I guess you do as well. I tend to want to setpan up to download anything that looks interesting to cache and then goaway for awhile, say to work or to sleep. When I get back or wake upseveral hours later, all those posts are cached locally and I can browsethru them as I like, basically instantly, sorting as I go, saving offattachments that I decide I really want to keep and then normallydeleting the messages, deleting others without permanent saving if afterlooking at them I decide they're not worth keeping. Of course thisrequires a *MUCH* larger cache, generally large enough to contain theentire download session, since in the first stage it's all onlydownloaded to cache, where it must remain until I've gone thru it.
But, as attachment sizes and volume increases, there comes a point atwhich the pre-cache method doesn't work so well. For still images andeven mp3s and low-ish resolution mpegs of a few minutes max, it stillworks reasonably well, and for that a cache size of say 16 Gig should befine since that's more or less what you can sort thru after a singlesession anyway.
But if you're looking at a 200 GiB cache, I'm guessing you're doingrather larger attachments, ISO-images and/or half-hour minimum possiblyHD-resolution TV programs and feature-length movies. Several hundred MiBfiles minimum, 4.7 GiB DVD images, possibly full 20-ish GiB Blurayimages, and/or perhaps whole TV series at a time.
For this, pre-caching really doesn't work so well anyway, in part becausepan doesn't direct-preview them as it does still-images. As such, thedirect-save method becomes the only practical method, both because itdoesn't require the huge cache, and because you have to save the filesoff to view them anyway.
So IMHO anyway, you might wish to reconsider your download method. Atyour volume the direct-save method is likely to be most practical in anycase, and shouldn't require that huge cache.
One other possibility.  It's possible (at least on Linux, don't know
about MS Windows or Apple OSX) to run multiple different pan instances
at once.  If you are active in enough groups and they split by subject
well enough, you could do that.  There's a variable that can be set to
point pan at a directory other than its default, and you can set this
differently to have multiple different pan setups.  I do that here.  If
multiple pan setups each with a 16 gig cache would work...
The environmental var in question is PAN_HOME. Here I set it in awrapper script to one of my three pan profiles, with one script for eachprofile: bin, text, test. Of course if you want and the groups areseparate enough, you can make that say mp3s, dvds, tvprogs. Or whatever.
Specifically, for my text groups I set PAN_HOME to ~/pan/text, so it usesthe settings there instead of in the default ~/.pan2. That lets me havedifferent settings (including different cache sizes) for each of myprofiles.
Then where appropriate, I use symlinks in each profile, pointing to filesin ~/pan/globals for some files (like my shared scorefile), and for mybinary profile, pointing my cache to a separate, dedicated partition,that's only used for cache for my pan binary profile.
Meanwhile, yet another possibility I didn't think of earlier...
You can setup a local news server such as leafnode. You'd then configureit to do the mass downloading from your NSP into its cache, and couldthen simply point pan at your local leafnode or whatever news server.Then you could probably leave pan's cache at the default 10 MiB size and/or even place it in tmpfs (a RAM-based filesystem), since your serverwith its own cache of whatever size would be local anyway.
So here's hoping you find at least some of that helpful... =:^)

---
[1] GNKSA: Good Net-Keeping Seal of Approval. While even its keepersacknowledge it's a bit outdated today, back in the day it served as awidely accepted guideline for news-client acceptable net behavior. Seethe pan website for details about pan's compliance and a link, and thelist archives for previous discussions about pan's compliance here.


--
Mit besten Grüßen / Kind Regards

Andreas Nastke
IT System Management

g/d/p Markt- und Sozialforschung GmbH
Ein Unternehmen der Forschungsgruppe g/d/p
Richardstr. 18
D-22081 Hamburg
Fon: +49 (0)40 / 29876-117
Fax: +49 (0)40 / 29876-127
address@hidden
www.gdp-group.com

Sitz der Gesellschaft ist Hamburg, Handelsregister Hamburg, HRB 40482
Geschäftsführer: Christa Braaß, Volker Rohweder

-----------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information.  If
you are not the intended recipient please notify the sender and  delete
this e-mail from your whole system. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
-----------------------------------------------------------------------

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Pan-users] article cache size, Duncan, 2014/10/01
- Re: [Pan-users] article cache size, Andreas Nastke <=

Prev by Date: Re: [Pan-users] article cache size
Next by Date: Re: [Pan-users] Article fetch fallback strategy
Previous by thread: Re: [Pan-users] article cache size
Next by thread: Re: [Pan-users] Article fetch fallback strategy
Index(es):
- Date
- Thread