pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Pruning


From: Duncan
Subject: Re: [Pan-users] Pruning
Date: Mon, 22 Apr 2013 06:34:02 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT f3d4165 /usr/src/portage/src/egit-src/pan2)

Beartooth posted on Sun, 21 Apr 2013 13:57:10 +0000 as excerpted:

> My .pan2 is running close to 400 MB, and I'm sure most of it is an aged
> accretion of cruft; I'd like to edit it down to somewhere between a
> tenth and a quarter of that. Is there an easy way, that will do no harm?

Well, "easy" is relative... and "do no harm" is relative as well, but 
yes, there's a way.

FWIW, my pan text-instance directory (.pan2, except it's pointed 
elsewhere, here) is a gigabyte, here, but that's because I deliberately 
set no expiration on my various text groups and a multi-gig cache size, 
so nothing expires in those groups.  I have messages in some groups going 
back years, some of them on servers or in groups that no longer publicly 
exist.


What I'd recommend doing first is using a graphical tool such as filelight 
or fsview (both kde tools but gnome and others probably have similar, 
looks like pysize is one such more universal tool), opening it to your 
~/.pan2 dir.  These tools show a graphical representation of files and 
(nested) directories by size, so it's dead easy to see what specific 
files are taking up the most room, and just how much room they are taking 
up as a percentage of the whole.

For instance, here, filelight tells and shows me that the article-cache 
subdir is taking up 92% of all the space used by my pan text instance 
data dir, 975 MB out of that gig I mentioned.  The groups subdir is 
taking up another 6% (67 MB), leaving 2% for the small stuff, but it's 
the groups subdir that has the largest files, with the largest single 
file being groups/gmane.linux.gentoo.devel , which is taking up 27+ MB on 
its own, about 2% of that 1-gig total and nearly half of the groups 
subdir, all by itself!  The four biggest files following that are 5-7 MB 
each, before they get too small for filelight to show them unless I dive 
into the groups subdir itself, making it the working dir on which 
percentages are based, etc.

It's thus immediately obvious that with the article-cache being 92% of 
the total, if I wanted to reduce the total substantially, I'd *HAVE* to 
shrink my article cache.

But of course as I said I'm not doing that here, as I'm effectively 
archiving those articles in pan.


But the story for most people should be quite different.  Pan's default 
cache size is 10 MB, so unless you set pan's cache to something well over 
the default, or unless there's a bug and pan's not deleting files when 
the cache gets too big, if as you say your .pan2 dir is 400-ish MB, 
deleting the entire 10 MB default cache won't do you much good.

Which is where the graphical filesize/directorysize tools help out, as it 
becomes immediately obvious what's taking up the space, and you can then 
either ask about that or simply do a backup, then delete the working copy 
and see if its loss fits your idea of "do no harm", or not.  (If you find 
it harmful, you can simply restore from that backup you made before the 
delete, thus my specific mention of the backup.)


Alternatively, here's a functional description of the various files and 
subdirs and what they do, so you can figure out for yourself whether 
losing that will be a big deal or not:

Subdirs:

article-cache: This is where pan stores the whole articles it has 
downloaded.  By default, this cache is limited to 10MB in size, so 
articles will be relatively temporarily stored here.  If you do primarily 
text groups, 10 MB might be a few days to a couple months worth of 
articles in cache.  If you do primarily huge binaries, ISO images and the 
like, obviously 10 MB won't hold much at all, just the parts pan's 
downloading and assembling to decode and save right then.  (The control 
for cache size is in pan prefs, near the bottom of the behavior tab, in 
the article-cache seciton.  However, if your pan is old enough, you won't 
have it there, and will have to edit it directly in preferences.xml using 
a text editor.

article-drafts: This holds draft articles you saved before sending (and 
with new enough pan, an autosave as well, but it gets reused with every 
article you compose, so...).  It could be quite big if you saved a bunch 
of them and haven't cleaned it up recently, and thus might be a candidate 
for cleaning.  If there's lots of files in here, try ordering them by 
date or size and deleting either the oldest or largest.

downloaded-attachments: AFAIK, this dir (if you have it at all) is an old 
one that should be safe to delete as pan no longer uses it by default.  
But to be sure, back it up before deleting, just in case.

encode-cache: This one's used by pan as temporary workspace for the 
(relatively) new binary-upload feature.  If your pan is too old to have 
that, you shouldn't have this dir, either.  But it should be empty or 
nearly empty unless pan crashed in the middle of an encode step, as pan 
should clean it out when its done.  If it's not empty (and you're not in 
the middle of a binary upload session), you should be able to delete the 
files here without damage.

groups: This subdir IS IMPORTANT, as files within it contain pan's header 
cache, one file per group.  These files MAY get somewhat large -- as I 
mentioned, that's where my largest individual files are located in my pan 
text instance data dir, but as long as you don't do like me and set 
unexpiring, they shouldn't grow without limit (unless you have filesystem 
corruption or something).  **HOWEVER**, they **MAY** be QUITE large for 
the most active binary groups, particularly on servers with decent binary 
retention (into the months or years).  I'd not be surprised to see the 
groups subdir files for active binary groups exceeding 100 MB in size, 
unless you have expiry set short enough to counteract that.

But of course you CAN delete the groups subdir files for groups you no 
longer visit and are no longer subscribed to, without issue, since 
they're just wasting space...

Also of special mention is the relatively new Sent file, corresponding to 
the "pseudogroup" within current pan.  If you send a lot of messages, 
this file could get pretty big over time.

It's worth noting that you can open these files in a text editor and look 
around if you're curious.  They are well commented at the top with an 
explanation of what is there and its format.  Just don't save any changes 
unless you know what you are doing... or are prepared to lose the header 
data for that group (maybe with a backup, just in case) if you screw up 
the edit.

ssl_certs:  This subdir will likely contain very small hash-data files 
for each of your servers that you have configured to use SSL.  However, 
these should be small indeed, only a few bytes each.

(Of course it's worth noting that on many filesystems, a file takes space 
in "blocksize" chunks, with "blocksize" often being either 1024 or 4092 
bytes (1 or 4 KB).  So these very small files, six bytes each here, will 
normally still take 1024 or 4096 bytes of space on most filesystems 
including ext*.  Still, it'd take either a big bug or a *LOT* of 
configured servers in ordered to make this dir big enough to get out of 
the noise at all.)


That's the subdirs, here's the files appearing in .pan2 itself:

Score: scorefile.  If yo use scores you don't want to delete this.  If 
you only assign scores using pan's GUI and you do it a lot, this file 
could be pretty big, as pan's GUI isn't very efficient at storing the 
scores it creates.  It's possible to manually edit the file to make it 
far more efficient, without losing any scores, but that's beyond the 
scope of this message, and in any event I'd suggest that given past 
history that you leave it alone unless it's getting to be a REAL problem, 
because I know it's more complex than you're normally prepared to deal 
with.

accels.txt: The "old-style" keyboard-accels file, possible but difficult 
to hand-edit, as while it's a text-file, it's a machine-ordered menu dump 
that has little/no human logic to it.  AFAIK it's still honored if pan 
finds it, but I believe pan prefers the pan.hotkeys file (new-style), 
now, so it can probably be deleted without issue, if you have the new 
file.  (But as usual, if you've customized your hotkeys, make a backup 
first before trying the delete, just in case.)

downloads.stats: This should be a small file consisting of a comment line 
and a number, that number being the bytes downloaded since the last stats 
reset.  The file will only exist with newer pan, since the feature that 
uses it is still relatively new.

group-preferences.xml:  This file contains a record of most or all groups 
you've visited, since doing so sets some group prefs for that group.  
While you probably don't want to delete the file itself, as doing so 
would delete all your group prefs, hand editing should be possible as 
long as you're careful, and may be desirable, since you can remove 
entries for groups you no longer visit and don't care to retain the 
preferences for.  

newsgroups.dsc: This file contains the newsgroup descriptions as 
downloaded from your servers whenever you refresh the group list.  
However, most groups don't have a good description anyway, so the 
descriptions list is of limited value, and once you have your set of 
subscribed groups and don't change them much or visit unsubscribed groups 
much any more, this is a good deletion candidate.  However, as mentioned 
it'll probably reappear when you next update your group list again.  But 
of course if you seldom do that, since you already have your list of 
subscribed groups and aren't generally interested in new ones anyway, the 
file might stay gone for quite some time.

newsgroups.xov: IMPORTANT!  This file contains a record of the groups 
you've visited and a per-server listing of the highest article number pan 
knows about for each group.  Thus, you don't want to disturb the entries 
for groups you actively visit.  However, the format is simple enough, one 
group per line, that you can delete whole lines for groups that you're no 
longer interested in, if you want.

newsgroups.ynm: Semi-important:  This file tracks per-group posting 
permissions: posting allowed (default/y), not allowed/read-only (n), or a 
moderated group (m).  I believe pan rebuilds this file when you update 
the group list, so it's not irreplaceable, but you don't want to go 
randomly deleting it either, as pan could then get quite mixed up if you 
try to post to a moderated or read-only group, until you do update the 
group list again.

newsrc*:  IMPORTANT!  There should be one of these files per server.  
They track read messages.  If a newsrc file for a server goeUnvisiteds 
missing, pan will lose this information and will show all messages on 
that server as unread once again (tho it's actually a bit more complex 
than that, since the read status from multiple servers carrying the same 
groups interact).

It's possible to manually edit the newsrc files without /too/ much 
trouble if you're careful.  Unsubscribed groups will have an exclamation 
point (!) appended, while subscribed groups will have a colon (:) 
appended.  If you've visited the group, there will be a space, and the 
article numbers for that group and server that you have marked as read.  
Some people may be interested in removing the tracking for groups they no 
longer visit, by removing the space and number sequence.

pan.hotkeys: This is the new-style keyboard-accels file.  It's easier to 
hand-edit if desired as there's comments and it's actually logically 
ordered, but changing the assignment in pan prefs is preferred.  If you 
have custom keyboard-accels configured you'll want to keep this file, but 
you might consider removing accels.txt, above, if you have both.  This 
new-style version is relatively recent, however, so older pan 
installations may not have this file, only the older one.

posting.xml: IMPORTANT! This file contains your posting profiles.  
Obviously you don't want to remove it unless you don't care about them, 
but it's reasonably easy to hand-edit, if you're careful not to break the 
xml.  However, it should remain reasonably sized unless you go hog wild 
with hundreds/thousands of profiles.

preferences.xml: IMPORTANT! This file contains pan's general preferences 
including the cache size preference mentioned above.  It's reasonably 
easy to edit as long as you're careful not to break the XML.  This file 
should remain pretty close to the same size (near 9 KB) always, tho 
individual changes will change it by few bytes.

servers.xml: IMPORTANT!  This file contains your server configuration.  
Again, it's reasonably easy to edit as long as you don't break the XML, 
and indeed, hand-editing this file is the only way to get some settings. 
(It's possible to set an arbitrary server rank here, for instance, while 
pan's GUI is limited to primary and backup.  Similarly, per-server expiry 
can be set to an arbitrary number of days, instead of the far more 
limited options the GUI gives you.  Finally, it's possible to set an 
arbitrary number of connections that pan will try to use if the server 
allows it here, while due to GNKSA, the GUI limits the maximum number of 
connections to 4.  That can be useful for paid accounts that allow 20, 
30, 50... connections, altho once you get into the double-digits, unless 
you're lucky enough to have a gigabit link to the internet, it becomes 
increasingly likely that more connections simply increase overhead and 
thus slow you down, instead of increasing download speed.  Again, this 
file should remain reasonably small, unless you go hog wild configuring 
hundreds/thousands of servers...

tasks.nzb: This is a standard *.nzb file, containing pan's list of 
uncompleted downloads.  (It only stores Message-IDs, not group refresh 
task data, which I believe is lost when pan exits.)  The file will thus 
be larger when you have a long list of downloads queued up, but should 
shrink to pretty small (just the standard nzb xml schema info, basically, 
194 bytes, here) when there aren't any articles queued for download.


I'd certainly investigate any files or subdirs other than those listed 
above, since they're likely to be from something other than pan...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]