[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] Scoring based on arbitrary headers?
From: |
Duncan |
Subject: |
Re: [Pan-users] Scoring based on arbitrary headers? |
Date: |
Sat, 10 Jan 2015 11:30:24 +0000 (UTC) |
User-agent: |
Pan/0.140 (Chocolate Salty Balls; GIT 2786476) |
Jim Henderson posted on Thu, 08 Jan 2015 19:02:20 +0000 as excerpted:
> On Thu, 08 Jan 2015 05:19:20 +0000, Duncan wrote:
>
>> Jim Henderson posted on Wed, 07 Jan 2015 00:34:20 +0000 as excerpted:
>>
>>>> Meanwhile, just to confirm, arbitrary header scoring did work, but
>>>> only after downloading the messages and possibly manually triggering
>>>> a rescore,
>>>> correct?
>>>
>>> Hmmm, I didn't try a manual rescore, but the scoring that applied to a
>>> post that should have been affected didn't show up when I went to look
>>> at the rules applied.
>>
>> OK, so we do /not/ have confirmation that pan actually does arbitrary
>> header scoring, but we /do/ have confirmation that /if/ it does, it
>> doesn't do it automatically after the download, and requires a manual
>> rescore.
>
> I'm not sure that that's an accurate summary of what my testing found -
> I ended up not getting a score based on an arbitrary header.
>
> Checking the score on a message that I know matches my arbitrary scoring
> rule, it doesn't show the score item I added.
>
> The lines I added to ~/News/Score were:
>
> %BOS %Score created by JSH
> [*opensuse.org*]
> Score:: =9999
> X-Forwarded-For: ^[address redacted]$
> %EOS
>
> Where [address redacted] is a valid IP address. I followed the format
> used for the From: score that appears above it in the file.
With my own testing (as mentioned in a post yesterday) demonstrating that
arbitrary-header scoring does work, and that pan appears to score on
download without a manual rescore, provided it has already loaded that
score, we're left with the following possibilities:
Either:
1) Your regex somehow failed to match,
OR
2) Pan hadn't yet reloaded the scorefile after you edited it, so it
didn't know about your new score when it downloaded your test messages.
OR
3) An absolute =nnnn (as opposed to additive nnnn, no =) score that
happened to match that message, appeared before your test score in the
scorefile. Because absolute scores are intended to be absolute, no
further scoring is done after the first absolute match is found -- that
first match is applied and that's it -- so unlike additive scores,
absolute score order MATTERS.
Here's what I did for my test. I used gmane as my test server and tested
in gmane.* groups. Due to the way gmane works, messages thru gmane have
a header that looks like this (obvious obfuscation applied to avoid gmane
email munging):
Approved: news at gmane dot org
Posts on gmane also have a header like this (picking your post as an
example):
Archived-At:
<http://permalink.gmane.org/gmane.comp.gnome.apps.pan.user/14813>
Since these are unlikely to be in the overview (tho I didn't actually
check) but are extremely common (pretty much every post) on gmane, I
decided they'd make good arbitrary-header scoring test material.
So:
[*]
Score:: 100 %testing arbheaders
Approved: gmane\.org
Score:: 200 %arbheaders test2
Archived-at: gmane\.org
Now those are additive scores and went below my normal scoring, so if any
absolute scores applied, pan would never get to these, but otherwise,
assuming no further additive scores applied, basically all "current" gmane
messages should get a score of 100+200=300.
Some things to note altho they'll be review for those familiar with the
scorefile format:
% starting a line indicates a comment. All those %BOS/%EOS lines that
pan adds are purely that, comments, and do nothing to change the actual
scoring. Knowing that, for me those comments are mostly noise and I
don't use 'em, tho I do have my own explanatory comments when necessary,
and do tend to keep an originating date on any /expiring/ score, just so
I know how long I intended it to run before expiring.
Similarly, on a score line, a % after the score value indicates a comment
and can be used to give the score a name, exactly as you see in my
example.
The [] starts a scoring section as well as indicating the newsgroups that
section applies to. Newsgroups entries are * wildcard, not the regex
that applies to the content of most headers. So the tested [*] says
match on the following scores regardless of the group name, which was
fine for my tests.
If I had set the first one to =100 instead of a bare 100, it would have
been an absolute score, and any match at that point would prevent pan
from even getting to the next score with that post, since an absolute
score is just that, absolute, and the first such matching score applies,
period. (Of course this is one of the possibilities I list above for why
your test didn't seem to work, that an earlier absolute score match
prevented pan from ever reaching the test score.)
For my testing purposes at least, I didn't need to match the entire
header, just verify that it was there, and that it contained the gmane.org
bit. Thus I didn't need the ^ and $ string beginning and ending
anchors. And of course the \. forces the dot to be matched as a literal
dot, not the "any character" that a dot metacharacter will normally match
in regex.
Now, after adding that to my scorefile and saving, I had to tell pan
about the scorefile changes. So I selected a message and hit Articles,
Edit Article's Watch/Ignore Score. In the resulting dialog box I simply
hit Close and Rescore, to get pan to reload the changed scorefile.
That did it. Most (cached) posts on gmane now appeared with a 300
score. Again, a few posts did not, because they matched some previous
absolute score and thus were assigned that score and never reached my
test scoring.
Switching groups with that setup was when I really noticed the slowdown
of those arbitrary-header scores, because now pan had to go thru all
cached messages on the new group, checking each one to see if the
arbitrary-header scores matched and scoring as appropriate.
Then I tried downloading new "headers" (really overviews) in subscribed
groups, to check scoring on new messages. Which is when pan crashed,
since (as I explained in yesterday's reply) that meant pan scanned a
known-bad message in another group, that is known to crash pan.
After restarting pan and figuring out what happened (verifying the crash
on getting new headers in subscribed groups another time or two in the
process), I tried getting new headers in /selected/ groups, without the
problem group selected. That worked without crashing!
And as expected, the new scores didn't apply to the just fetched
"headers" (overviews), because the overviews didn't contain the headers I
was trying to score on.
But as soon as I downloaded the actual messages, the news scores applied
as the content was actually there to match against, now. =:^)
But again, while I didn't see any in my short test (I couldn't get
headers in subscribed groups or in the single affected group, without
crashing, remember, and I didn't like the scanning delay when I switched
groups either, so I had no interest in prolonging the test), had any of
the new messages matched an absolute score reached before my test scores,
of course the test scores wouldn't have applied here either.
Sooo...
What I'd suggest you try next is a more general match, as I did. If you
use gmane you can duplicate my test scores and verify that they're
working for you too, before proceeding.
Once you get something general obviously applying, then home in on your
objective. First try a score like this:
[*]
Score:: =500 % test x-forwarded-for
X-Forwarded-For: .*
That's absolute 500, to hopefully distinguish it from all the absolute
9999/watch scores, assuming you have score-colors set appropriately, and
the score column set to display.
And it should match ANY post in ANY group, that has ANY x-forwarded-for
header set, no matter the content.
Once that is verified to work as expected, narrow it down one factor at a
time:
[*opensuse.org*]
...
First the newsgroup, matching any group name containing opensuse.org.
...
X-Forwarded-For: somedomain\.net
...
Then try a simple general domain match.
That might be narrow enough right there, without a full string match. If
not, continue to narrow it down, until you get a positive match without
too many false-positives.
Of course somewhere in there you can set your desired score, as well.
But don't forget, with absolute scores involved, order matters! So order
accordingly. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
- Re: [Pan-users] Scoring based on arbitrary headers?, (continued)
- Re: [Pan-users] Scoring based on arbitrary headers?, Duncan, 2015/01/05
- Re: [Pan-users] Scoring based on arbitrary headers?, Jim Henderson, 2015/01/06
- Re: [Pan-users] Scoring based on arbitrary headers?, Duncan, 2015/01/06
- Re: [Pan-users] Scoring based on arbitrary headers?, Jim Henderson, 2015/01/06
- Re: [Pan-users] Scoring based on arbitrary headers?, Duncan, 2015/01/08
- Re: [Pan-users] Scoring based on arbitrary headers?, Jim Henderson, 2015/01/08
- Re: [Pan-users] Scoring based on arbitrary headers?, Jim Henderson, 2015/01/08
- Re: [Pan-users] Scoring based on arbitrary headers?, Duncan, 2015/01/09
- Re: [Pan-users] Scoring based on arbitrary headers?, Jim Henderson, 2015/01/09
- Re: [Pan-users] Scoring based on arbitrary headers?, Duncan, 2015/01/09
- Re: [Pan-users] Scoring based on arbitrary headers?,
Duncan <=
- Re: [Pan-users] Scoring based on arbitrary headers?, Jim Henderson, 2015/01/10
- Re: [Pan-users] Scoring based on arbitrary headers?, Duncan, 2015/01/10