emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Discrepancy in definition/use of match-data?


From: David Kastrup
Subject: Re: Discrepancy in definition/use of match-data?
Date: 12 Jun 2004 11:03:53 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

"Stephen J. Turnbull" <address@hidden> writes:

> >>>>> "David" == David Kastrup <address@hidden> writes:
> 
>     David> How strong were the results from the regression test?
> 
> What does that mean?  There is one test that specifically checks for
> match-data being preserved across a failed match, and it failed.

Ah, ok.  I thought that the regression test might have involved
calling a lot of complex functions to see whether they still worked.
It dod not occur to me that it checked for the particular semantics
explicitly.

>     David> What kind and amount of code appears to be affected?
> 
> Kind?  I know of one specific use in `w3-configuration-data' in
> w3-cfg.el which calls itself recursively, and depends on being able
> to use the top-level match-data if the match in the recursive call
> fails, while using the match-data from the recursive call otherwise.
> Too sneaky to live, I suppose you could say.

Sneakiness is all fine.  It is just the question whether it is worth
the price.  It would be a good idea to check this price out after the
next release, as said.

> Amount?  How should I know?  I can say I've tried inserting warning
> code in match_limit and file-name-sans-extension produces the
> warning during the dump phase.  I don't know why yet, my
> implementation may be incorrect (the flag is either getting reset
> when it shouldn't, or fails to get set on a successful search---the
> Boyer-Moore code is the most complicated thing I've ever tried to
> deal with).
> 
> However, I know that file-handlers can get called there, via
> file-name-directory inter alia.  In general, even a simple variable
> reference can call arbitrary Lisp code in XEmacs (because of magic
> Mule handlers) and most likely GNU Emacs (because those handlers
> were introduced for GNU Emacs compatibility).

In which case this particular sort of references better use
save-match-data.

>     David> With the current code.  But I was also proposing to void
>     David> the match-data in the main loop: that would produce
>     David> errors more often, and only in situations where the
>     David> match-data was completely unpredictable to start with
>     David> (and possibly undefined, too).
> 
> But that's not true.  The match-data was sufficiently predictable
> that split-string only failed when it was voided.

The code in question was the following:

  (let (parts (start 0) (len (length string)))
    (if (string-match pattern string)
        (setq parts (cons (substring string 0 (match-beginning 0)) parts)
              start (match-end 0)))

The error occured when the condition was false to start with and setq
not reached.  Then we had the following:

    (while (and (< start len)

This condition is true for non-empty string

                (string-match pattern string (if (> start (match-beginning 0))
                                                 start
                                               (1+ start))))

start is still 0, match-beginning has a random value (this is where
the error got flagged).  Without an error, we'll get into the false
branch, so the match starts at 1 now.  Since the precondition was only
true for non-empty string, starting the match at 1 is valid, will
again fail like the first match, and the loop body gets skipped.

So this time we are lucky: the random contents of match-beginning
would indeed not matter.  Would you want to rely on that?

> I think it's likely that that mistake has been made elsewhere.

I doubt that we can rely on its consequences being benign.

> On the other hand, pretty much any time any Lisp code intervenes
> between the return of the matching function and entry to the match
> data access there is an opportunity for a hook or handler to be
> called.

That's what save-match-data is for.  Hooks or handlers that intervene
at random places need to use it when they might likely call regexp
code.  This is nothing new.

> Probability of hitting it on any given path is low, but potential
> for random annoyance for years to come is high, I fear.

It is obvious that you are not fond of the interface, presumably
preferring a complex return value for matching functions including the
match data.  But it is not like it would be feasible to change it,
anyway.

The only addition I can think of is to offer an explicit function
void-match-data that users would be free to call at points when they
know the match-data no longer to be needed.  Having such a function
explicitly available might help for debugging use-before-definition
cases like the above.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum




reply via email to

[Prev in Thread] Current Thread [Next in Thread]