gnu-regexp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Regexp] Bug


From: Clemens Wagner
Subject: [Regexp] Bug
Date: Mon, 7 Jun 2004 12:45:05 +0200

Hello,

have found a very mysterious bug in gnu.regexp-package. Consider
the following method:

    public static void gnu(String inData) throws Exception {
        RE thePattern = new RE("<html>[\n\\w\\W]*?UTF-8\">");
        long theTime = System.currentTimeMillis();

        thePattern.substitute(inData, "");
        theTime = System.currentTimeMillis() - theTime;
        System.out.println("time: " + theTime);
        thePattern = new RE("^.*?UTF-8\">", RE.REG_DOT_NEWLINE);
        theTime = System.currentTimeMillis();
        thePattern.substitute(inData, "");
        theTime = System.currentTimeMillis() - theTime;
        System.out.println("time: " + theTime);
    }

With the input string (the newslines are neccessary):
----- 8< -----
<html>





<head>
<title>title</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
----- >8 -----
This method prints out:
time: 106731
time: 8
The first expression needs more than a factor of ten-thousand than the
second one. The first pattern requires more than 256Mb of memory for
execution.

We have considered this behaviour under Linux, Windows and MacOS X. Thus,
it seems to be a serious bug in gnu.regexp.

Best regards
        Clemens

--
senior consultant technologie

denkwerk  | vogelsanger straße 66 | d-50823 köln
telefon +49 221 2942 100 | telefax +49 221 2942 101
http://www.denkwerk.com





reply via email to

[Prev in Thread] Current Thread [Next in Thread]