[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug in gawk 3.1.1?
From: |
Aharon Robbins |
Subject: |
Re: bug in gawk 3.1.1? |
Date: |
Thu, 5 Sep 2002 13:44:59 +0300 |
In article <address@hidden>,
Stepan Kasal <address@hidden> wrote:
>Hello Lorenzo,
>
>On Wed, 04 Sep 2002 17:58:27 +0200, LorenzAtWork wrote:
>> address@hidden (Aharon Robbins) wrote:
>> > [...]
>> > I fixed this basically by using a heuristic. If the RS regex
>> > ends in ?, *, or +, and the end of the regex match is within
>> > a few bytes of the end of the buffer, then read in some more
>> > text and try again.
>> > [...]
>
>> /a(b+c)*bc/
>
>Or:
> /a(|bcbbc)bc/
Yeah, I was hoping no-one would notice. Oh well.
>Thank you very much, I guess you are right and the heuristics should
>be applied in each case the regex _contains_ any of the following chars:
>
> ? * + |
>
>Extending the heuristics this way should not break anything, it has the
>same properties as the original Aharon's:
>
>1) it catches some of the problem cases (though not all, of course,
>consider "a.*Z" with an occurence of "Z" at the end of file)
>
>2) it doesn't represent any problem except slight memory inefficiency
>when non-trivial RE's are used as RS.
>
>Have a nice day,
> Stepan Kasal
Here's a patch, relative to yesterday's. Have fun.
Arnold
------------------------------------------
*** awk.h.save Wed Aug 21 15:40:04 2002
--- awk.h Thu Sep 5 13:07:46 2002
***************
*** 1002,1007 ****
--- 1002,1008 ----
extern void resyntax P((int syntax));
extern void resetup P((void));
extern int reisstring P((char *text, size_t len, Regexp *re, char *buf));
+ extern int remaybelong P((char *text, size_t len));
/* strncasecmp.c */
#ifndef BROKEN_STRNCASECMP
*** re.c.save Wed Aug 21 13:52:10 2002
--- re.c Thu Sep 5 13:07:22 2002
***************
*** 284,308 ****
{
static char metas[] = ".*+(){}[]|?^$\\";
int i;
- int has_meta = FALSE;
int res;
char *matched;
/* simple checking for has meta characters in re */
for (i = 0; i < len; i++) {
if (strchr(metas, text[i]) != NULL) {
! has_meta = TRUE;
! break;
}
}
/* make accessable to gdb */
matched = &buf[RESTART(re, buf)];
- if (has_meta)
- return FALSE; /* give up early, can't be string match */
-
res = STREQN(text, matched, len);
return res;
}
--- 284,317 ----
{
static char metas[] = ".*+(){}[]|?^$\\";
int i;
int res;
char *matched;
/* simple checking for has meta characters in re */
for (i = 0; i < len; i++) {
if (strchr(metas, text[i]) != NULL) {
! return FALSE; /* give up early, can't be string match
*/
}
}
/* make accessable to gdb */
matched = &buf[RESTART(re, buf)];
res = STREQN(text, matched, len);
return res;
}
+
+ /* remaybelong --- return TRUE if the RE contains * ? | + */
+
+ int
+ remaybelong(char *text, size_t len)
+ {
+ while (len--) {
+ if (strchr("*+|?", *text++) != NULL) {
+ return TRUE;
+ }
+ }
+
+ return FALSE;
+ }
*** io.c.fix1 Wed Sep 4 13:17:37 2002
--- io.c Thu Sep 5 13:23:41 2002
***************
*** 2630,2642 ****
* This matches the "xyz" and ends up putting the
* "abc" into the front of the next record. Ooops.
*
! * The test for a *, +, or ? at the end of the RE
! * is a heuristic (spelled k l u d g e).
*/
/* succession of tests is easier to trace in GDB. */
if (iop->cnt != EOF) {
! if (strchr("+*?", RS->stptr[RS->stlen-1]) !=
NULL) {
! if ((iop->end -
(start+REEND(rsre,start))) < RS->stlen) {
bp = iop->end;
continuing = TRUE;
continue;
--- 2630,2647 ----
* This matches the "xyz" and ends up putting the
* "abc" into the front of the next record. Ooops.
*
! * The remaybelong() function looks to see if the
! * regex contains one of: + * ? |. This is a very
! * simple heuristic, but in combination with the
! * "end of match within a few bytes of end of buffer"
! * check, should keep things reasonable.
*/
/* succession of tests is easier to trace in GDB. */
if (iop->cnt != EOF) {
! if (remaybelong(RS->stptr, RS->stlen)) {
! char *matchend = start + REEND(rsre,
start);
!
! if (iop->end - matchend < RS->stlen) {
bp = iop->end;
continuing = TRUE;
continue;
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. address@hidden
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 928 569 9018
Nof Ayalon Cell Phone: +972 51 297-545
D.N. Shimshon 99785 ISRAEL