pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: Composing regex for Pan


From: Duncan
Subject: [Pan-users] Re: Composing regex for Pan
Date: Fri, 12 Mar 2004 19:32:00 -0700
User-agent: Pan/0.14.2.91 (As She Crawled Across the Table)

Michael R. McCarrey posted <address@hidden>,
excerpted below,  on Mon, 08 Mar 2004 11:42:48 -0800:

> Earlier in the list, someone commented that Pan's regex parsing was case
> insensitive (and had to be enabled manually - "how" was not mentioned),
> which explains, I think, why [A-Z] [^a-z] makes everything disappear.

I believe that "someone" was me.  I have mentioned the case insensitivity
b4.  There's a good reason I didn't mention /how/ to enable sensitivity
-- I don't know.  <g>  I think I mentioned that as well..  all the places
I've used regex b4 are case sensitive unless insensitive is specified, so
I don't know how to reverse it or even if it's possible.

I'd suggest trying stuff like [:upper:] and [:lower:], the POSIX character
classes.  (Do note that these are sub-classes.  IOW, put them in an
additional character-class delimiter set for actual use, thus [_[:upper:]]
would be upper-case and the underscore, [[:upper:]]  if just upper case.) 
Again, haven't tried it so don't know for sure if it'll work.

Then there's esoteric stuff like perl's unicode properties, using the \p
metachar (or \P for negation). I'm nowhere NEAR sure the PCRE (perl
compatible regular expression) library PAN uses is advanced enough to be
THAT compatible, but it's worth a shot. There's a whole bestiary of these
things, but the ones you may be interested in in context would be
\p{IsLower} and of course the inverse IsUpper as well.  (The Is portion
should be optional, but again, even if pcre does properties, it may not
understand them without the Is prefix.)

Another thing to try would be the \x and/or \nnn metachars.  \x followed
by two hex digits references the ASCII hex value of the character.  \nnn
where the n's are octal digits is the similar octal code version.  Of
course, doing it this way may or may not be possible, but even if
possible, would mean a character class enumeration or at minimum a
character class range such as [\x00-\xff] (all byte values would match,
leaving as an exercise for the reader figuring out which byte value range
to use for upperchar <g>).

Less esoteric, but I've only seen them used in replacements and don't know
if they work in matches, are the \l \L \u \U metachars.  These uppercase
the next char (for the lowercase metachar), or until \E, for the uppercase
metachar.  Again, these are normally used to force upper or lower in
replacements, not for matching, and that not in PAN but in general, but it
might be worth a shot.

If you get any of these (or anything else) to work, be sure and post
which, as it could be useful for others, once you work it out.  Here, in
particular, it'd be useful to know that pcre does indeed support beast X
of the perl regex bestiary, as there are other apps using pcre as well,
meaning it'd be useful across all of them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin






reply via email to

[Prev in Thread] Current Thread [Next in Thread]