[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RE for any text, including white space
From: |
ken |
Subject: |
Re: RE for any text, including white space |
Date: |
Wed, 16 Mar 2011 17:53:04 -0400 |
User-agent: |
Thunderbird 2.0.0.24 (X11/20101213) |
On 03/16/2011 03:40 PM PJ Weisberg wrote:
> On 3/16/11, ken <gebser@mousecar.com> wrote:
>> What's the RE for any text, white space included? I also want to grab
>> (for match-string...) this text. The text is bounded by known
>> characters. E.g.,
>>
>> <h3>Any Text-- <a name="thisname">
>> Hot Stuff</h3
>> In the above, how to grab the text of the title, i.e., everything
>> between <h3> and </h3>? Conceivably this title text might contain
>> *anything* except "</[Hh]{1-9]".
>>
>
> If A and B are your start and end points, then you want:
>
> "A\\(.\\|\n\\)*?B"
That's almost it, but not quite. It grabs only the on last character
before the "B"; in my example above it grabs just "f". I'm needing to grab:
"Any Text-- <a name="thisname">
Hot Stuff"
-- without the quotes, of course.
>
> You probably got thrown off by the fact that '.' matches anything
> EXCEPT a newline.
Well, no, I discovered that a long time ago. I'm thrown off by a lot of
things though... like why.... Well, I don't want to throw the thread
off in four other directions, so I won't say.
If what you gave me works to find just the "f" before "</h3", then
something like "<h3>\\(\\[.\n\t ]*\\)</h3" should work, right? Nope.
> Regexps are usually assumed to be line-based.
Yeah. That must be a throw-back to the mainframe days. And that's
unfortunate.
>
> The '?' is there to make the '*' non-greedy, to prevent it from
> matching everything between the first A and the last B in the whole
> buffer.
I've formulated a lot of other similar REs without using the '?' and
they work fine, so I didn't even try that. Once I find something that
works, it would be interesting then to see the differential effect with
and without it.
>
> The double '\'s are necessary in lisp code because it's interpreted as
> a string before it's passed to the regexp engine.
Yeah, I've seen and used a lot of that. Most of the time my first guess
gets it right.
>
> -PJ
Thanks much for the good attempt.
Ken