[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Pan-users] Composing regex for Pan
From: |
Michael R. McCarrey |
Subject: |
RE: [Pan-users] Composing regex for Pan |
Date: |
Mon, 15 Mar 2004 11:43:27 -0800 |
On Sun, 2004-03-14 at 06:01, Paul Hudson wrote:
> > >
> > > \b[:upper:]{2,}\b
> > This dumped all replies. The regex animal book doesn't
> > explain those constructs very well (nor have any of the web
> > sites I've looked at).
>
> Have a look at the link I sent - all the info's in there somewhere, I think
> :)
>
> > > http://www.pcre.org/pcre.txt).
Yes, it's somewhere alright <g> I've a;ready noticed some interesting
elements which may apply. Sure won't hurt to try them out.
>
>
> > > (?-i)\b[A-Z]{2,}\b
> > This works, sort-of, if I select NONE OF:, but things like
> > "!?&" in the string break it.
>
> (All the below untested as before)
>
> So, I'm unclear what you want. How about keeping things with at last one
> word with at least one lower case letter in the middle of it?
>
> (?-i)\b.+[a-z].+\b
This logs an error: Can't use regex "(?-i)\b.+[a-z].+\b": Invalid
preceding regex.
>
> > What I've been reading says that the ? refers to "zero or more times"
> > (this must be my "snake & necklace" problem again).
>
> It's the ( followed by ? that is important here - you're correct that ? In
> other contexts means zero or more
> >
> > I want to dump as many of the annoying spam, troll and
> > AOL-keyboard posts as I can, which I think, will require
> > parsing the string's individual characters, multiple times
> > (maybe my approach is flawed?) Once for ALL CAPS (if true,
> > dump the post, regardless of additional characters in the
> > string).
>
> So dump lines that match
>
> (?-i)[a-z]
>
> maybe (don't contain at least one lower case character)
This also logs an error: Can't use regex "(?-i)[a-z]": Invalid preceding
regex. Could this be caused by the condition I set in Pan (NONE OF:)? I
think so as changing the condition to ANY OF: or ALL OF: does not log an
error. This bites me often.
>
> >After that, it gets interesting. Now we should have
> > mixed-case alpha and/or alpha-numeric (or "should" have).
>
> So, don't do anything with these (leave them with the default score which
> means they'll be shown)
Before they reach the point of being displayed, I want to check those
results and further qualify them.
>
> > Next, filter on multiple instances (2 or more to start) of
> > any non-alpha, printable characters, anywhere in the string.
>
> Do you mean the same charact repeated? This one's interesting. I think we
> can use backreferences here....
>
> Keep lines that don't match
>
> [:punct:]\1
Is this like recursion or repeatedly calling a subroutine until a
specified condition is met or one has run out of options?
>
> > Dump the matches. Then filter those results against any other
> > specific criteria until what remains are subjects that look
> > "normal" as in: Just a test post | Just A Test Post | Just a
> > Test Post #10 | any of the previous, prefixed by "Re:", ect.
>
> These should be straightforward?
>
> What are you setting the score to for each of these?
Presently, all scoring is default, as are the rules. I wanted to get a
functioning set of filters before I started messing around with scoring
and the rules.
>
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.572 / Virus Database: 362 - Release Date: 27/01/2004
>
>
>
>
> _______________________________________________
> Pan-users mailing list
> address@hidden
> http://mail.nongnu.org/mailman/listinfo/pan-users
- [Pan-users] OT: Namespace collision Was: Composing regex for Pan, (continued)
- [Pan-users] OT: Namespace collision Was: Composing regex for Pan, Duncan, 2004/03/31
- Re: [Pan-users] Re: Composing regex for Pan, John Aldrich, 2004/03/31
- Re: [Pan-users] Re: Composing regex for Pan, John Aldrich, 2004/03/31
- Re: [Pan-users] Re: Composing regex for Pan, John Aldrich, 2004/03/31
- Re: [Pan-users] Re: Composing regex for Pan, John Aldrich, 2004/03/31
- Re: [Pan-users] Re: Composing regex for Pan, Wolf J. Flywheel, 2004/03/31
RE: [Pan-users] Composing regex for Pan, Paul Hudson, 2004/03/13
Re: [Pan-users] Composing regex for Pan, John Aldrich, 2004/03/13