[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Stack overflow in regexp matcher
From: |
Stefan Monnier |
Subject: |
Re: Stack overflow in regexp matcher |
Date: |
Sun, 01 Apr 2007 14:50:47 -0400 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.0.96 (gnu/linux) |
> The regexp in question looks like: ".*some text.*". When used with
> string-match, removing both ".*"s shouldn't affect the matching
> semantically, right?
Differences (off the top of my head):
- match-beginning (as well as the return value of string-match)
- .* only matches chars other than \n
- .* matches greedily, so (string-match "foo" "foofoo") will only match the
first "foo", whereas (string-match ".*foo" "foofoo") will match the whole
string (the .* matches the first "foo").
> If so, would it be wise to add a regexp simplification step that, for
> example, could remove leading and trailing ".*"s in regular expressions
> passed to string-match?
In the code that calls string-match, yes, but within string-match it would
be terribly difficult to figure out whether the optimization is harmless.
> I realise now is not the time to talk about feature additions, but it
> would certainly make my life simpler if any regular expression that *can*
> be compiled to a DFA is.
I don't think you'll find much disagreement here.
> (I also wonder where "regular expressions" that are more than regular
> came from. Just because they are useful?)
No idea. Originally, only the \N backref was non-regular, but PCRE
introduced many others. Note that using a DFA to implement unix
regexp-matching is actually pretty difficult because of the precise
semantics of sub-group matching. So most "early" implementations used
backtracking, which made non-regular extensions "free".
Stefan
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: Stack overflow in regexp matcher,
Stefan Monnier <=