pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Scoring based on arbitrary headers?


From: Duncan
Subject: Re: [Pan-users] Scoring based on arbitrary headers?
Date: Fri, 9 Jan 2015 11:23:47 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT 2786476)

Jim Henderson posted on Thu, 08 Jan 2015 19:02:20 +0000 as excerpted:

>> I guess I should do a bit of experimentation of my own... but I'm lazy.
>> Still, if I get the motivation... sometimes these things build in the
>> background until I just decide to do it one day...
> 
> I know the feeling. :)

OK, I have tomorrow off and decided it's time to experiment (even if it's 
passed 3 AM here ATM)...

* Pan, at least git-pan (version in headers) *DEFINITELY* knows how to 
score on arbitrary headers -- it works here! =:^)

* Of course after adding the new scoring rule, you must either tell pan 
to do a rescore manually, or reload the group (by switching to another 
and back), so pan knows the scorefile has changed, before it'll show the 
results of the new scoring rule on existing messages.

* As expected, overview-only messages ("header-only" messages that don't 
have the full message in cache) do not get the arbitrary-header scores 
applied as those headers aren't downloaded yet.

* Downloading the message *DOES* appear to apply the arbitrary-header 
scores automatically, provided of course that pan already knows about 
them (see the second point, above).


So far, as predicted.  HOWEVER, unhappily...

* Applying arbitrary-header scores has a rather high per-processed-
message cost, as pan loads every single cached message in the active 
group as it checks for that arbitrary-header-match.

On a default 10 MiB cache that's not going to be too terrible, but on my 
unexpiring-archive multi-gig cache with messages going back to 2002 in 
some groups (including this list/group), it can take /minutes/ to load a 
group as it scans all those cached messages in ordered to score them!

So, as I suggested, you'll want to ensure a cache size large enough to do 
contain all the messages you want to cache and arbitrary-header score, 
BUT, arbitrary-header scoring will quickly turn unworkable due to waiting 
if you're like me and have over a GiB of primarily text (so small) 
messages cached!

* Additionally, there's one problem message in one of those groups that I 
reported as triggering a segfault.  I have it saved to investigate 
further later so I've not deleted it, and normally simply don't click on 
it so it doesn't crash pan.  However, as a result of pan scanning full 
messages when it has arbitrary-header scores loaded, with such a score 
active I can no longer enter that group, since pan will try to scan that 
message and promptly segfault!  Similarly, I can't tell pan to get new 
headers (um... overviews!) for all groups (even if I'm in a different 
one), since apparently that triggers a scan of that file as well.

I did test to see if I could get _overviews_ for individual groups and 
that works fine.  I can also switch groups, which works fine (tho it 
takes "forever" as mentioned, especially for the more active or longer 
cached history groups) as long as I don't try to switch to /that/ group.  
As soon as I do something that'll trigger a rescore for that group, 
however, pan will crash due to scanning that known-bad message.

OK, so I should really finish that investigation and delete that message, 
or at least move its cache-file elsewhere.  That'd presumably let me 
access that group again.  However, arbitrary-header-score-scanning is 
really too slow for the number of messages I have cached anyway, so I'll 
probably simply delete those test scores and forget about using arbitrary-
header scoring here.

But it /does/ work!

Thus, I'd guess you either (a) failed to reload the scorefile (either by 
rescoring or by toggling to a different group and back), or (b) 
something's wrong with your scoring rule, or (c, for others) perhaps 
you're using an old version, which doesn't have the patch enabling that 
feature.

But I can see from your headers that c shouldn't apply as you're on git 
as well (the same commit I was running until a couple days ago, when an 
update pulled in a couple l10n updates, no actual code), leaving a or b.

The good news is that it does work, and that with your 10 MiB default 
cache size, while you might notice a bit of a slowdown switching groups, 
it shouldn't be the multiple minutes I'm getting here, with a gig plus 
cache and messages back to 2002 in some groups.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]