pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] nested rules, Re: Pan-users Digest, Vol 119, Issue 9


From: Duncan
Subject: Re: [Pan-users] nested rules, Re: Pan-users Digest, Vol 119, Issue 9
Date: Thu, 20 Dec 2012 21:53:42 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT 04c43ec /usr/src/portage/src/egit-src/pan2)

FritzS - gmx posted on Thu, 20 Dec 2012 20:13:18 +0100 as excerpted:

> Now I use this in the score file
> %--------------------------------------------
> % Chamaeleon
> 
> [de.*, at.*]
> Score: =-7444
> User-Agent: MacSOUP/D-2\.8\.3 \(Mac OS X version 10\.6\.8\(x86\)\)
> X-Complaints-To: address@hidden
> 
> %--------------------------------------------
> 
> but the second line are ignored from pan, if I write this false
> "address@hidden" it works too.

I'm having a bit of a parsing problem with that.

You're saying news@ does NOT work (ignored) but newssss@ DOES work 
(works)?  "Too" is normally used to indicate "also", so I'd expect both 
to work, or both not to work, which would agree with my technical 
understanding of the scoring rules, but that's not what the "ignored" on 
one but "works too" on the other one seems to indicate, so I'm confused.

> Here the original NNTP header lines from a message I want to score
> X-Complaints-To: address@hidden
> User-Agent: MacSOUP/D-2.8.3 (Mac OS X version 10.6.8 (x86))

Note that I'm reading this thru gmane, which encrypts parts of strings 
that appear to be email addresses for spam-control reasons.  Therefore I 
don't see the address in your X-Complaints-To line as you typed it, but 
as gmane encrypts it, which is... troublesome... when the literal string 
may be important.

If you put spaces around the @ and change it to (at) , however, gmane 
leaves the obfuscated version alone.  Also, \. seems to get thru, so your 
scorefile version address@hidden came thru without encryption.  So 
please use one of those forms (and mention it if it's not clear, the (at) 
form usually is, based on past experience).

(I've wondered about requesting that gmane turn address encryption off 
for this list/group as well as the pan-dev list/group, but I guess Petr 
Kovar is list admin, so he'd need to be the one to email gmane, 
requesting it.)

> Did I adapt this correct for the score file?
> 
> What effected the   ^   and the   $
> sample
> User-Agent: ^MacSOUP/D-2\.8\.3 \(Mac OS X version 10\.6\.8\(x86\)\)$

The ^ and $ are regex beginning and end of line anchors, respectively.  
So a condition line without them would accept the line with a bunch of 
other content at either end, while ^ at the beginning of the line 
indicates the regex match MUST occur at the beginning of the line, and $ 
at the end indicates the regex match MUST occur at the end.  Thus, 
enclosing the regex between ^ and $ indicates that the regex must match 
the ENTIRE line, nothing else before or after the match.

As with other regex "special" characters like \|.*?()[]{} , the 
"specialness" can be escaped with the \ (backslash) character, so for 
example, \\ matches a literal backslash and \$ can be used anywhere in 
the line to match a literal dollar sign.  (As usual with regex, . will 
match any character, but you seem to have already noted that, ? means the 
preceding match may or may not appear, and * means any number of the 
preceeding, so xay* will match xay and xaaaaay but not xaabay.  It'll 
also match xy (no a), since zero is a number.)

Double-check your () vs. \(\) usage as () forms a grouping.  So (ab)* 
will match xabababy and xy (zero is a number) but not xay or xabby.  
Again, \(\) escapes the specialness for a literal match.

(See the misc.taxes example in the documentation for a literal $ match, 
as \$ .  Here's the link again for convenience since I snipped that bit 
above.)

http://www.slrn.org/docs/score.txt

*BUT*, I THINK what you MAY be missing isn't anything to do with regex, 
but rather, the distinction between overview headers and non-overview 
headers.  Headers contained in the overview can be scored before the 
message is downloaded, since they're in the overview.  Headers not in the 
overview cannot be matched until after a message is actually downloaded, 
since they're not available until then (they're not in the overview).

See the second paragraph (with its list of typical overview headers) of 
section 1.1 in score.txt as linked above.

In particular, if you're using an AND scoring condition (single colon 
after the score), which you are, and one of ANDed conditions matches a 
header in the overview but another does not, you're likely to have 
problems, especially if that score is in one of your automatic action 
zones.

For quite some time, pan's scoring ONLY worked on the overview headers.  
I believe Heinrich patched it to work on ALL headers once a message is 
downloaded (tho I've not actually used that functionality personally, so 
can't personally vouch for it actually working, when I needed it, pan 
didn't have the ability at all, but that was probably 7-7 years ago now), 
but scoring is still much more efficient on overview headers received 
BEFORE a message is downloaded, and as I mentioned, ANDed scoring 
combining overview and non-overview headers is likely to be problematic.  
While scoring after an article is downloaded isn't optimal, it's still 
useful, particularly for negative-scoring/ignoring, since while the 
message must still be downloaded, scoring (especially combined with the 
automatic scoring based actions feature) can still avoid you having to 
actually see and deal with the message manually.

Now, neither user-agent nor x-complaints-to are traditionally in the 
overview file, but as score.txt mentions, the admin can add particular 
headers to the overviews as they find them useful.  Thus, it may be that 
one of those headers is in your overview, and scoring against it alone 
works, but attempting to score against both will not work until the 
message is actually downloaded and the other header is available.  I'm 
wondering if that's the problem you're actually seeing.

Of course, if neither one is in your overviews, scoring against either 
one alone (as well as against both, ANDed) would fail, until the message 
was actually downloaded.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]