Re: Discrepancy in definition/use of match-data?

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Discrepancy in definition/use of match-data?

From:	David Kastrup
Subject:	Re: Discrepancy in definition/use of match-data?
Date:	11 Jun 2004 01:56:39 +0200
User-agent:	Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

Richard Stallman <address@hidden> writes:

>     I think the fix would be to have match-limit return Qnil instead of
>     flagging an error for the condition
> 
>       n >= search_regs.num_regs
> 
>     Correct?
> 
> I agree with you.

I have thought a bit about the context in which I have seen this error
condition flagged in the last few days.  The first example was inside
of split-string (the 21.4.13 XEmacs version) when running inside of a
process filter: the routine was buggy since it could call
(match-beginning 0) even when there were no preceeding _successful_
matches.  However, since the current API will not change the match
data after an unsuccessful match, in most uses this bug was masked by
some old existing data in (match-beginning 0).  So only the filter
routine (which starts out with a fresh copy of void match-data)
provided a sufficient context for flagging the bug.  In particular,
such a bug is very evasive in debugging, since the situation of
pristinely void match-data will then typically be absent.

In consequence, this bug persisted for half a year _after_ it was
observed.  If I change the test as indicated above, the bug would not
even get flagged anymore, not even in a process filter, and would go
completely undetected.  Not good.

Now the obvious solution to that would be to make unsuccessful
matches void the match-data, too.  I myself have no recollection of
any discussions about this, but Stephen Turnbull was pretty vocal
about having proposed something like this, but that apparently the
idiom

(while (search-forward "Something"))
do something with (match-beginning 0)

was given as a reason why an unsuccessful match should not clear the
match-data.  But actually, the documentation of the match-data
function claims:

    Return value is undefined if the last search failed.

Invalidating the match-data after unsuccessful matches would quite
increase the probability of catching such bugs, while being consistent
with the current documentation of match-data.  However, if this
suggestion has been turned down with the above example, then this
"idiom" needs to get searched and replaced.  It is my personal opinion
that this idiom is not worth the loss of bug detection, and it is
inconsistent with the current match-data documentation, anyway.

But it would probably be a bad idea to change this right now in the
feature freeze, possibly breaking things relying on that quite
undocumented behavior.

So my proposal is the following plan:

Before next release:
match-data gets voided upon entry of a filter or sentinel, like it is
being done now.  With void match-data, match-beginning and so on flag
an error irrespective of their argument.  The match-data is only
touched by a successful match.  Once a match has been successful,
match-beginning and so on will not flag errors for positive
arguments, but return nil (as documented).

In order to have a better chance of catching such
use-before-valid-match situations, it might be a good idea also to
void the match-data in the main loop.

After the next release, I would like to have unsuccessful matches
also void the match-data, making the use of the above quoted idiom
illegal.  It is consistent with the current documentation of the
functions, and it gives much better possibilities of actually
catching bugs that at the moment only trigger errors in obscure,
badly debuggable places like output filters.

We should do this change right after the next release so that Emacs
core developers and those for add-on packages have enough time testing
it, finding the problematic code and fixing it for the next release
after that.

What I'll do right now is just changing the condition that it will not
flag an error for greater-than-encountered match-data indices, except
for the case of completely void match-data where I'll still flag an
error.  So my current change will flag fewer errors than before (and
in particular leave out false alarms like in locate.el).

This change does not merit discussion before application: it only
fixes things without breaking existing idioms.  My other proposals
(such as voiding the match-data in the main loop) would warrant other
opinions, however.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

[Prev in Thread]

Current Thread

[Next in Thread]

Discrepancy in definition/use of match-data?, David Kastrup, 2004/06/09
- Re: Discrepancy in definition/use of match-data?, Richard Stallman, 2004/06/10
  - Re: Discrepancy in definition/use of match-data?, David Kastrup <=
    - Re: Discrepancy in definition/use of match-data?, Stephen J. Turnbull, 2004/06/11
    - Re: Discrepancy in definition/use of match-data?, David Kastrup, 2004/06/11
    - Re: Discrepancy in definition/use of match-data?, Stephen J. Turnbull, 2004/06/12
    - Re: Discrepancy in definition/use of match-data?, David Kastrup, 2004/06/12
    - Re: Discrepancy in definition/use of match-data?, Richard Stallman, 2004/06/12
    - Re: Discrepancy in definition/use of match-data?, Stephen J. Turnbull, 2004/06/14
    - Re: Discrepancy in definition/use of match-data?, David Kastrup, 2004/06/14
    - Re: Discrepancy in definition/use of match-data?, Stephen J. Turnbull, 2004/06/14
    - Re: Discrepancy in definition/use of match-data?, Stephen J. Turnbull, 2004/06/16
    - Re: Discrepancy in definition/use of match-data?, Richard Stallman, 2004/06/18

Prev by Date: Re: IPv6
Next by Date: TODO's
Previous by thread: Re: Discrepancy in definition/use of match-data?
Next by thread: Re: Discrepancy in definition/use of match-data?
Index(es):
- Date
- Thread