[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] nested rules, Re: Pan-users Digest, Vol 119, Issue 9
From: |
Duncan |
Subject: |
Re: [Pan-users] nested rules, Re: Pan-users Digest, Vol 119, Issue 9 |
Date: |
Thu, 20 Dec 2012 21:53:42 +0000 (UTC) |
User-agent: |
Pan/0.140 (Chocolate Salty Balls; GIT 04c43ec /usr/src/portage/src/egit-src/pan2) |
FritzS - gmx posted on Thu, 20 Dec 2012 20:13:18 +0100 as excerpted:
> Now I use this in the score file
> %--------------------------------------------
> % Chamaeleon
>
> [de.*, at.*]
> Score: =-7444
> User-Agent: MacSOUP/D-2\.8\.3 \(Mac OS X version 10\.6\.8\(x86\)\)
> X-Complaints-To: address@hidden
>
> %--------------------------------------------
>
> but the second line are ignored from pan, if I write this false
> "address@hidden" it works too.
I'm having a bit of a parsing problem with that.
You're saying news@ does NOT work (ignored) but newssss@ DOES work
(works)? "Too" is normally used to indicate "also", so I'd expect both
to work, or both not to work, which would agree with my technical
understanding of the scoring rules, but that's not what the "ignored" on
one but "works too" on the other one seems to indicate, so I'm confused.
> Here the original NNTP header lines from a message I want to score
> X-Complaints-To: address@hidden
> User-Agent: MacSOUP/D-2.8.3 (Mac OS X version 10.6.8 (x86))
Note that I'm reading this thru gmane, which encrypts parts of strings
that appear to be email addresses for spam-control reasons. Therefore I
don't see the address in your X-Complaints-To line as you typed it, but
as gmane encrypts it, which is... troublesome... when the literal string
may be important.
If you put spaces around the @ and change it to (at) , however, gmane
leaves the obfuscated version alone. Also, \. seems to get thru, so your
scorefile version address@hidden came thru without encryption. So
please use one of those forms (and mention it if it's not clear, the (at)
form usually is, based on past experience).
(I've wondered about requesting that gmane turn address encryption off
for this list/group as well as the pan-dev list/group, but I guess Petr
Kovar is list admin, so he'd need to be the one to email gmane,
requesting it.)
> Did I adapt this correct for the score file?
>
> What effected the ^ and the $
> sample
> User-Agent: ^MacSOUP/D-2\.8\.3 \(Mac OS X version 10\.6\.8\(x86\)\)$
The ^ and $ are regex beginning and end of line anchors, respectively.
So a condition line without them would accept the line with a bunch of
other content at either end, while ^ at the beginning of the line
indicates the regex match MUST occur at the beginning of the line, and $
at the end indicates the regex match MUST occur at the end. Thus,
enclosing the regex between ^ and $ indicates that the regex must match
the ENTIRE line, nothing else before or after the match.
As with other regex "special" characters like \|.*?()[]{} , the
"specialness" can be escaped with the \ (backslash) character, so for
example, \\ matches a literal backslash and \$ can be used anywhere in
the line to match a literal dollar sign. (As usual with regex, . will
match any character, but you seem to have already noted that, ? means the
preceding match may or may not appear, and * means any number of the
preceeding, so xay* will match xay and xaaaaay but not xaabay. It'll
also match xy (no a), since zero is a number.)
Double-check your () vs. \(\) usage as () forms a grouping. So (ab)*
will match xabababy and xy (zero is a number) but not xay or xabby.
Again, \(\) escapes the specialness for a literal match.
(See the misc.taxes example in the documentation for a literal $ match,
as \$ . Here's the link again for convenience since I snipped that bit
above.)
http://www.slrn.org/docs/score.txt
*BUT*, I THINK what you MAY be missing isn't anything to do with regex,
but rather, the distinction between overview headers and non-overview
headers. Headers contained in the overview can be scored before the
message is downloaded, since they're in the overview. Headers not in the
overview cannot be matched until after a message is actually downloaded,
since they're not available until then (they're not in the overview).
See the second paragraph (with its list of typical overview headers) of
section 1.1 in score.txt as linked above.
In particular, if you're using an AND scoring condition (single colon
after the score), which you are, and one of ANDed conditions matches a
header in the overview but another does not, you're likely to have
problems, especially if that score is in one of your automatic action
zones.
For quite some time, pan's scoring ONLY worked on the overview headers.
I believe Heinrich patched it to work on ALL headers once a message is
downloaded (tho I've not actually used that functionality personally, so
can't personally vouch for it actually working, when I needed it, pan
didn't have the ability at all, but that was probably 7-7 years ago now),
but scoring is still much more efficient on overview headers received
BEFORE a message is downloaded, and as I mentioned, ANDed scoring
combining overview and non-overview headers is likely to be problematic.
While scoring after an article is downloaded isn't optimal, it's still
useful, particularly for negative-scoring/ignoring, since while the
message must still be downloaded, scoring (especially combined with the
automatic scoring based actions feature) can still avoid you having to
actually see and deal with the message manually.
Now, neither user-agent nor x-complaints-to are traditionally in the
overview file, but as score.txt mentions, the admin can add particular
headers to the overviews as they find them useful. Thus, it may be that
one of those headers is in your overview, and scoring against it alone
works, but attempting to score against both will not work until the
message is actually downloaded and the other header is available. I'm
wondering if that's the problem you're actually seeing.
Of course, if neither one is in your overviews, scoring against either
one alone (as well as against both, ANDed) would fail, until the message
was actually downloaded.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman