pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Filtering out responses to specific posters


From: Duncan
Subject: Re: [Pan-users] Filtering out responses to specific posters
Date: Wed, 23 Sep 2015 02:39:49 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT c9c83f3)

JCA posted on Tue, 22 Sep 2015 08:43:23 -0600 as excerpted:

> Here's the issue: Filtering out posts from well known cranks and trolls
> is easy to do with Pan. However, I would also like to be able to filter
> out any responses to such posts, no matter who is responsible for the
> response. Can this be done with Pan? The obvious approach would be to
> filter out on the basis of the contents of each post - but I think that
> Pan does not have such a capability.
> 
> Is the problem that I am posing solvable under Pan?

It depends.

What you'd need to find is something consistent but also unique in the 
message-id header of the troll/OP, since in general that's the only 
element of the message that is included in all followups, and thus can be 
used to score all followups.

The message-id header is supposed to be unique to each message as it is 
how the message is tracked, but with the exception of some general rules 
governing its form (the pattern, which is similar to an email address , 
address@hidden, and the characters it can contain), the exact 
method for ensuring uniqueness is left up to the ID-generator, which can 
in fact be either the news client posting the message, or the server it 
is originally uploaded to, if the news client didn't set message-ID.

As a result, the contents of the message-ID header varies greatly, but 
depending on what actually set it in an individual case, individual 
posters or at least individual news clients and news providers often have 
parts of the header that remains reasonably consistent, while of course 
other parts make it unique.  At a minimum, this is often the @sample.org 
domain-name part, but often, something on the other side can be combined 
with that as reasonably stable as well, thus helping to differentiate 
between posters within the domain-organization.  In terms of scoring (and 
in pan, the watch/ignore stuff is actually very high/low scores, 9999 or 
higher for watch, -9999 or lower for ignore), the ideal of course is if 
the target uses a stable email address and the entire email address is 
included as part of message-id, naturally with some changing time-date/
sequential/random number attached as well, to help ensure that the 
message-ID as a whole is unique.

In practice, while guaranteeing 100% message-id pattern match to only 
that specific poster, but also everything posted by them, is rather rare, 
unless the poster is deliberately trying to avoid such scoring/filtering 
and thus posting without for example a consistent @domain.name portion of 
the string, as long as you limit the score to either that specific group 
or a small list of related groups that tend to have the same list of 
regular posters as well, there's at least a fair chance that you can find 
a pattern that matches _only_ that regular poster, without matching other 
regular posters to that group or limited number of related groups.  While 
the score may well apply as well to an occasional non-regular poster that 
happens to have a similar message-id string (say they both post using the 
same client, which sets the message-id using the domain name from the 
email address, and they happen to both be posting using addresses with 
the same provider), if you're willing to pay that price and lose the 
occasional random non-regular poster as well, there's a fair chance you 
can match only a single regular poster, particularly if no other regulars 
are using the same email domain (if the client assigns it using that), or 
the same news server (if the client doesn't set it, leaving it to the 
news server, which will then almost certainly use the same domain name 
portion of the string).


Once you find such a uniquely identifying (at least among regular posters)
mesage-id pattern, you'll set the scores to match on it in _references_, 
since that's where it'll appear in followups.  Note that this will _not_ 
match the original poster themselves (except when they're following up to 
their own message), as the pattern will appear in their message-id, then, 
not in references -- it's only in references in _followup_ messages.  So 
you'll probably want a second, separate match, on message-id, to catch 
the troll themselves, as well.

If however the troll is deliberately randomizing message-id strings, not 
just parts of strings, there will likely be no pattern to match.  
However, the ability to play such tricks does require a client that gives 
you direct control over your message-ids, and while spammers and serious 
trolls are likely to have such tools, the ordinary joe-troll level of 
troll isn't likely to have such access, thus giving you a reasonable 
chance against them that you won't have against those with serious attack/
spam level tools.

And of course if the troll is using a popular posting server for that 
newsgroup and both he and other regular posters are using clients that 
leave the message-id to the server, it's not going to work.  Oh, well...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]