[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] Filtering
From: |
Duncan |
Subject: |
Re: [Pan-users] Filtering |
Date: |
Fri, 16 Sep 2016 21:43:18 +0000 (UTC) |
User-agent: |
Pan/0.141 (Tarzan's Death; GIT 4e0db5ff8) |
DLSauers posted on Fri, 16 Sep 2016 14:18:38 +0000 as excerpted:
> In one particular group I haunt the alot of cruft gets crossposted in
> for non related topics...
>
> I heavily filter this group, but could probably gut down on adding
> filters daily and/or the existing ones if I could just get pan to filter
> out things.
>
> What I am after is something along...
>
> Lets say the group is x
>
> If the post has MORE than group X and contains *.politics.* etc... mark
> it -99999999999999999999999999999999999999999999999999999 or what ever..
>
> None of the options for scoring rules seems to allow this or work, the
> only way to filter this stuff is set up a lot of rules like
>
> contains Hillary contains Trump contains gay contains ......
>
> Just being able to if post is xposted to more than 1 group ie X mark it
> -9999 would nuke a lot of stuff....
>
> Or is pan not able to setup such advanced scoring filters via the GUI
> and/
> or otherwise????
>
> This group is rather problematic, and always has been.. It has the
> biggest fitlering/killfile and well the only filtering and killfil I've
> used on Usenet in 30+ years!
>
> Any hints on getting more advanced filtering done???
First the general stuff, since you didn't indicate whether you knew this
already yet or not, but might, if you're a list regular, as I've posted
it here many times over the years, tho you likely won't otherwise unless
you've used the other clients previously and looked at the scorefile
itself, comparing it with that of the other clients.
Pan's scorefile format is in general a less advanced implementation of
SLRN's scorefile format (without the fancy stuff such as includes...):
http://slrn.sourceforge.net/docs/score.txt
... but with the case insensitivity (but not the other changes) of xnews
(my link for that one is dead, but slrn is primary, so it's not worth
trying to google or otherwise resurrect the xnews one).
Here's the abridged version of the format description I keep as comments
in my own scorefile:
% [newsgroup.*] wildcard (not regex) format (~ negates).
% header lines regex. (~ negates).
% Score conditions, single : and, double :: or.
% Expires: immed. below score if present.
% Leading % indicates comment
% Leading whitespace and blank lines ignored.
% Regex and newsgroup matches case insensitive with
% keyword:, sensitive with keyword=.
% Newsgroup change delimits section,
% Score delimits "rule", multiple rules per section allowed.
% Comment after score becomes rule "name".
% Score levels: <=-9999 kill, -9998 to -1 low,
% 0, 1 - 4999 med, 5000 - 9998 high, >=9999 watch
** EXCEPT: Unfortunately the last time I investigated, pan's scoring had
a bug, and would **NOT** do logical AND -- the single : was treated as OR
(::) regardless. Fortunately, most of my scoring (and I guess pretty
much anyone elses) is single-shot OR logic anyway, so that's not as big
of a deal as if OR logic were broken instead of AND, but it /does/ rather
kill a direct implementation of your AND test above... if the bug still
exists, which I suppose it does but haven't recently tested.
However, it's /somewhat/ possible to work around that limitation by
judicious use of additive scoring -- as an example, use two rules that
each set -5000, so they combine to -10000 and trigger the kill level.
(Tho if you have other rules that add say 100 and a message triggers them
as well it'll end up at -9900 and not trigger kill, but that's a good
thing as it makes it far more flexible, just make the two -9998 each so
each one /almost/ kills, and any trivial +100s won't undo the kill of
both combined, if you want that, or make them both -4950 if you want a
trivial -100 to be necessary as well to kill, or...)
The other thing that should stick out as pretty important from the above
rules, once you understand a leading % indicates a comment, when looking
at the rules pan creates if you use its gui to create rules, is that:
** Most of the lines pan adds to the scorefile are simply extra
explanatory comments -- they don't actually affect the rules at all and
deleting many of them can help massively shrink your scorefile without
affecting actual scorefile logic at all.
Finally, if you've been using pan's GUI to create most of your scores and
haven't edited or have only lightly edited the scorefile itself, and you
do a LOT of scoring, you should be able to *greatly* optimize things with
some rather more active manual scorefile formatting and editing. For
instance, a short excerpt from the alt.* spam-kill section of my own
scorefile:
Warning, adult themed example!
%#####################################################################
%#####################################################################
[alt.*]
Score:: =-9999 %Alt kill
From: Seeking teens
From: teens seeker
From: ^LoLiTa <
From: ^GOBLIN <
From: sex coed
From: NudeGirls
From: voyeur only
From: amateur
From: SEXmag
From: teens
From: intermixed
From: rectal
Subject: adult movies
Subject: dupped
Subject: ^\([-0-9/]*\)
Subject: Use critical pack from Microsoft Corporation
Subject: R/-\\PE
Subject: R/-\|PE
Subject: Horny mom
Subject: rectal exam
Subject: body cavity
Subject: mature women
Subject: candid voyeur
Just imagine how many lines that would take if they were each
individually added as separate rules, complete with multiple comment
lines each, by pan's GUI. Here, they're both easily human-read, and far
easier and more efficient for pan to parse.
The down side to this level of scorefile editing, of course, is that in
ordered to maintain it, you pretty much have to either add new entries
manually, or pretty regularly go in and reoptimize all the entries you've
added via the pan GUI since the last time you cleaned up.
The up side is of course that once you have it cleaned up, it's dead easy
to manually add an additional single-line entry.
Meanwhile, a few hints:
* Set a pan hotkey for the articles, edit article's watch/ignore score,
function. From there you can hit the close and rescore button, to rescore
based on any manual edits you just made to the scorefile. That's the
easiest way to get pan to reapply freshly manually edited scores I've
come up with.
* Use %#### or similar comment lines to visually separate sections, as I
did in the example above.
* Consider whether you want an expiring or permanent score. Permanent
scores can be easily added to the nicely edited groups manually, while
it's tougher to group expiring scores since the expires line will differ,
so adding these via the pan gui works well enough.
* Consider adding a %### separator line or two at the bottom of your
permanent scores, so pan can append the expiring scores you add via the
GUI, and it's easier to go in and clean up later since you know where the
new ones start. Talking about which...
* Pan doesn't clean up expired scores on its own. You'll have to go thru
and weed them out once in awhile. (After doing so a few times, you may
find yourself not adding so many expiring scores, choosing instead to
either add a permanent one or simply skip it, so you don't have to clean
up the expired score later. But if you're like me you'll still add a
few, for people irritating enough to want to score down temporarily, but
who you think might still learn some maturity, in say a year or so, so
you don't want to make it permanent just yet.)
* For expiring scores, I've found it helpful to keep pan's "created by
Pan on <date>" comments, as that way I not only know when it expires, but
I know when it was created, and thus have some idea of how irritated I
was when I created the entry, based on how long I set it to last before
expiring.
*** Pan can score based on any header, not just the ones the GUI allows
you to score. However, headers that aren't in the overviews as sent from
the server won't apply until the message is actually downloaded to cache,
making them much less efficient since you won't be able to see the effect
until the message is already downloaded and in cache. That's a
limitation of the protocol (and overviews) that pan can't do anything
about, but sometimes, having to download a message before it can be
killed is still better than having to actually read it.
*** The above should let you manually add scores based on either the
newsgroups header (as opposed to the newsgroup you're actually in at the
time, the [*] section head specifier), or the xrefs header, both of which
will contain the list of cross-posted groups (the xrefs header only
listing the ones carried on that server, along with the message number
for the message in each of those groups, the newsgroups header listing
all the groups the message was posted to, regardless of whether that
server carries them or not). However, I'm not sure whether these rules
will apply before or after download, due to the above mentioned overviews
issue.
Those last two hints should allow you to score based on crossposting to
N+ groups, provided you know enough about the crossposted group names in
advance to create a score for them. Alternatively, scoring on xref and
counting the number of colons should allow you to score on a message
posted to N+ groups regardless of name, provided the server carries that
many of the groups and thus crossposts the message to them. But again,
I'd not know for sure without actually testing it, whether such scores
could be applied before download, with only the overviews information
available, or if they could only be applied after download. Either way,
it should be possible, but one will obviously be far more convenient than
the other.
And again, as I said above, tho I believe the AND logic bug will prevent
combining both an N newsgroups and a subject line filter into one,
requiring both, by using multiple scoring rules and adjusting the scores
applied by each, you should be able to approximate the same thing.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
- Re: [Pan-users] Filtering,
Duncan <=