bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

dfa.h / dfa.c diff versus gawk attached


From: Aharon Robbins
Subject: dfa.h / dfa.c diff versus gawk attached
Date: Thu, 06 Sep 2007 17:54:38 +0300
User-agent: Mutt/1.5.14 (2007-02-12)

Greetings.

Attached is a diff of the grep 2.5.3 dfa.h and dfa.c against the current
version of same in the gawk CVS. (Or, it'll be in CVS within an hour or
so. :-)

The changes fall into two categories: bug fixes, mostly having to do
with multibyte character sets, and reviving the DFA matcher's ability
to match across newlines, which grep doesn't need but which gawk does.
This latter changes the interface to dfaexec.

I believe that the grep developers have had most of these changes in the
pipeline for a while, but I thought it wouldn't hurt to submit a fresh
set of diffs.

One new thing is that I have added the ability to let the caller of
the dfa routines know that the matcher is broken in certain cases. The
only case I know of at the moment is

        (foo){0}
        (foo){0,0}

which the DFA matcher treats as (foo){1} whereas regex correctly does
not match "foo".  This is a problem in the DFA parsing as it builds the
parse tree that represents the DFA ... I could not see how to work
around it there, or anywhere else in the code. (Fixes welcome!)

It remains my hope that "one day" the grep distribution will return
to being the canonical source for dfa.h and dfa.c, and that I can
synchronize from it (as I do with GLIBC, for example) rather than the
other way around.

Thanks,

Arnold
-- 
Aharon (Arnold) Robbins                                 arnold AT skeeve DOT com
P.O. Box 354            Home Phone: +972  8 979-0381    Fax: +1 206 350 8765
Nof Ayalon              Cell Phone: +972 50  729-7545
D.N. Shimshon 99785     ISRAEL

Attachment: grep-diff
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]