pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] article cache size


From: Duncan
Subject: Re: [Pan-users] article cache size
Date: Sat, 9 Mar 2024 01:49:34 -0000 (UTC)
User-agent: Pan/0.155 (Kherson; 020f52b16)

David Chmelik posted on Fri, 8 Mar 2024 05:19:17 -0000 (UTC) as excerpted:

> On Tue, 30 Sep 2014 21:10:56 +0000 (UTC), Duncan wrote:
>> As to your question, years ago I was the person who asked to bump the
>> max cache size from 1 GiB -- I needed 4 GiB at the time and it was
>> bumped to 20, which was great.
> 
> What size do you recommend if I currently use 1,500+ newsgroups, and
> some are binary but dead, so let's say all plain-text, but some are
> high- traffic like the Linux kernel listserv on gmane?  I rarely read
> that; it's more out of curiosity.  There's maybe under 40 I'd read daily
> if they have traffic, but many/most don't except occasionally/rarely,
> though usually there's something daily.  Most are miscellaneous
> subjects, like computer science/engineering & software I just
> occasionally have questions on, like here, but other times don't keep up
> on, and just select and mark read.

Interesting/good question.

The discussion below gets a bit technical and arguably goes on a couple 
tangents.  Jump to the 4th paragraph from the end if you're just 
interested in some recommendations.  Read thru if you like technical and 
find tangents interesting! =:^)

Primarily practical news-cache size depends on how you use pan and how 
long you intend to retain messages.

Pan's cache-size default, way too small by my usage (text or binary, I 
have separate instances for each), appears to be designed primarily for 
either text-only with some short-term (a few sessions) caching or process-
as-you-go (not even a single full session) with anything above trivial 
numbers of binaries.

My usage, instead, is archiving for text, and for binary, multi-session 
sampling and download-interesting-to-cache first session, then go through 
again when everything's cached so access is instant, to sort out what I 
downloaded and either delete directly if I decide I don't want to save it 
permanently after all, or sort and save off to permanent storage, then 
delete from pan (which I believe deletes from cache).

While the default size would (for my usage) keep text around a few 
sessions so I could refer back to messages if I wanted to see a full 
message when it was context-quoted in the reply, it certainly wasn't 
suitable for long-term "archiving" storage of any sort.  For binaries it 
was HORRIBLE, as I'd hit the cache-size limit and start deleting older 
messages in the first session, before I did anything but read the 
overview!  I wasn't even reading downloaded messages before they were 
deleted due to cache limits!

So you said basically text, (what I'd call *MANY* groups (1500), with some 
high traffic and perhaps a few trivial binaries.  Great.  But how much do 
you download to keep around even if you don't read it, and how long do you 
actually want to KEEP it around?

Here, for text and trivial binaries (say "trim" for HTML messages level in 
some text groups that allow them (the kernel group while high traffic does 
NOT AFAIK), the occasional screenshot, etc), only a relatively few groups 
but with near all traffic to them archived (unexpiring-cached) in some 
cases since 2002...

Here's what compsize (transparent compression report for btrfs) says for 
my text instance dedicated partition, basically the .pan directory but 
mostly cache:

$$ sudo compsize /nt/
Processed 278330 files, 180543 regular extents (180543 refs), 99005 
inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       47%      999M         2.0G         2.0G
none       100%       14M          14M          14M
zstd        47%      985M         2.0G         2.0G

(Compsize says the article cache itself is 970M used, 1.9G uncompressed, 
so it is indeed most of the above.  And the 14 M uncompressible in in the 
cache, so I'll presume it's pre-compressed binaries sent yencoded, because 
MIME/UUE encoding is inefficient/compressible)

So roughly 2 GiB uncompressed, compressed down to ~half size or ~ 1 gig 
using zstd (level 3, default for btrfs if zstd compression is chosen) 
compression.  Only a trivial 14 MiB is uncompressable.

Here's the btrfs filesystem usage report for that partition, which is 
btrfs raid1, so I can use half that 10 GiB total space:

Overall:
Device size:                  10.00GiB
Device allocated:              3.06GiB
Device unallocated:            6.94GiB
Device missing:                  0.00B
Device slack:                    0.00B
Used:                          2.37GiB
Free (estimated):              3.63GiB      (min: 3.63GiB)
Free (statfs, df):             3.63GiB
Data ratio:                       2.00
Metadata ratio:                   2.00
Global reserve:               16.69MiB      (used: 0.00B)
Multiple profiles:                  no

Data,RAID1: Size:1.00GiB, Used:853.71MiB (83.37%)
/dev/sdd8       1.00GiB
/dev/sdc8       1.00GiB

Metadata,RAID1: Size:512.00MiB, Used:357.91MiB (69.90%)
/dev/sdd8     512.00MiB
/dev/sdc8     512.00MiB

System,RAID1: Size:32.00MiB, Used:16.00KiB (0.05%)
/dev/sdd8      32.00MiB
/dev/sdc8      32.00MiB

Unallocated:
/dev/sdd8       3.47GiB
/dev/sdc8       3.47GiB

Now btrfs stores small files (2048 byte and under by default, which I use 
here) in-line in the metadata, and some of those text-message cache files 
will certainly qualify, thus explaining the difference between the 
reported data usage of ~854 MiB here while compsize said 999 MiB -- some 
of that 999 is stored in the metadata not data.

Total used including metadata is 2.37 gig but that's across both physical 
devices so divide by two for raid1, ~1.2 gig of data+metadata.  The 3.63 
GiB reported Free pre-accounts for the raid1, including 3.47 GiB not 
allocated (per device) plus the still unused space withing the data 
chunks.

So of the 5 GiB effective space (5 gig per device but raid1 across two 
devices), ~1.2 gig is used, ~3.6 gig is free, and the other ~0.2 gig is in 
the unused metadata, system chunk, etc.

But if I wasn't using btrfs compression it'd be roughly half full.  All in 
all, pretty reasonable usage for a dedicated-usage partition where you 
want some room to grow.

Finally, the pan cache for that:  Again, set unexpiring (server settings) 
so it effectively caches "forever", in prefs, size of article cache is set 
to 5120 MiB = 5 GiB.

Which pan couldn't actually hit if I weren't using btrfs compression 
because the filesystem itself is exactly 5 GiB, and there's metadata 
overhead plus the non-article-cache files in the pan dir.  But with 
compression it should actually be able to hit that 5 GiB, and could 
probably hit ~9 GiB or so, assuming the same near 2:1 compression ratio 
continues.  So I have room to set that higher as my archive continues to 
grow...

Now a guess at translating that for you...  Many more groups (say 100 
times as many...), still mostly text, but presumably you aren't archiving 
"forever", and if I've interpreted your description correctly, you 
probably don't download as much of the groups as I do.  However, at least 
one of those groups is LKML (the kernel list), far higher traffic (if 
enforced text-only) than anything I subscribe to and archive.

At a guess, I'd say start with a gig.  That should reasonably safely 
accommodate even your 100X the number of groups, text-mostly, for a 
"reasonable" period of a month or so, which I'll say is about the max time 
discussion threads are likely to be active so you can refer back to 
previous articles without re-downloading, again assuming you're not 
downloading everything in the group.

If you want to be extra safe or see messages you know you downloaded 
disappearing (and your filesystems aren't going haywire due to crashing 
and filesystem immaturity... btrfs is generally past that now but was 
still a bit iffy when I started with it), double that to 2 GiB 
(uncompressed), which again is roughly what I'm seeing with some groups 
near-archived for 20+ years now, but at ~1% of the groups.

Even with ~1500 groups, text-mostly, downloading-to-cache near all 
messages, I'd be quite surprised to see usage over 2 GiB with an effective 
lifetime of under a month (even two), because that's simply *HARD* to do 
with text-mostly groups ... *UNLESS* you're grabbing some prolifically AI-
spammed groups or something (the *HARD* to do assumes *humans* actually 
writing all those messages -- two GiB of data is simply a LOT of text for 
even a few hundred /humans/ to write over a couple months, but automate it 
with AI and that assumption's out the window!)

If you're considering a dedicated partition, 5 gig for it should be good, 
as it is for me.

If you're actually archiving those 1500 groups... I'd say start with 10 
GiB, but until you have say a year of history to make a reasonable 
projection into the future, watch the usage and consider the possibility 
of having to adjust that up or being able to adjust it down, with a 
dedicated partition if used similarly larger, maybe 20 or 25 gig. With a 
year of history you should be able to project /reasonably/ comfortably the 
usage out to storage replacement cycle lengths (double the year's activity 
for a reasonable margin and multiply to cover your time until expected 
upgrade, increase by 50% or double again for dedicated partition size if 
used -- unless of course activity is multiplying, as it well could be on 
groups with uncontrolled AI spam).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]