[Regexp] Bug

gnu-regexp-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Regexp] Bug

From:	Clemens Wagner
Subject:	[Regexp] Bug
Date:	Mon, 7 Jun 2004 12:45:05 +0200

Hello,

have found a very mysterious bug in gnu.regexp-package. Consider
the following method:

    public static void gnu(String inData) throws Exception {
        RE thePattern = new RE("<html>[\n\\w\\W]*?UTF-8\">");
        long theTime = System.currentTimeMillis();

        thePattern.substitute(inData, "");
        theTime = System.currentTimeMillis() - theTime;
        System.out.println("time: " + theTime);
        thePattern = new RE("^.*?UTF-8\">", RE.REG_DOT_NEWLINE);
        theTime = System.currentTimeMillis();
        thePattern.substitute(inData, "");
        theTime = System.currentTimeMillis() - theTime;
        System.out.println("time: " + theTime);
    }

With the input string (the newslines are neccessary):
----- 8< -----
<html>





<head>
<title>title</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
----- >8 -----
This method prints out:
time: 106731
time: 8
The first expression needs more than a factor of ten-thousand than the
second one. The first pattern requires more than 256Mb of memory for
execution.

We have considered this behaviour under Linux, Windows and MacOS X.Thus,

it seems to be a serious bug in gnu.regexp.

Best regards
        Clemens

--
senior consultant technologie

denkwerk  | vogelsanger straße 66 | d-50823 köln
telefon +49 221 2942 100 | telefax +49 221 2942 101
http://www.denkwerk.com

[Prev in Thread]

Current Thread

[Next in Thread]

[Regexp] Bug, Clemens Wagner <=

Prev by Date: [Regexp] Add a listing
Next by Date: [Regexp] Обучение
Previous by thread: [Regexp] Add a listing
Next by thread: [Regexp] Обучение
Index(es):
- Date
- Thread