Regular expressions with high or nested repetition counters

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Regular expressions with high or nested repetition counters

From:	Dag Hovland
Subject:	Regular expressions with high or nested repetition counters
Date:	Mon, 03 May 2010 09:51:36 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Thunderbird/3.0.4

Hi!

I am a PhD student doing some research in regular expressions. I have
implemented an algorithm for matching a subclass of the regular
expressions with "numerical constraints", that is, expressions like
"a{10,100}". My starting point was the problems that can occur when
using large numbers, or nested constraints, as for example in the command:

grep -E "([0-9]{1,2}h([1-5]?[0-9]m([1-5]?[0-9]s){1,60}){1,60}){0,100}"

which consumes over 2 gb of memory on my computer.


from the grep man-page:
"Large repetition counts in the {n,m} construct may cause grep to use
lots of  memory."

An article describing the algorithm is available at
http://hdl.handle.net/1956/3628

A C-implementation is available at
http://www.ii.uib.no/~dagh/facgrep

The implementation is probably buggy, as I am not very good in C, but it
works for simple examples. On the example above, procps reports that it
uses around 2 megabytes of memory.

I wondered if you are interested in having this kind of functionality in

GNU grep? Note that for the majority of normal/simple regexps, grep doesbetter than my program above. Also, it only handles a (well-defined)subclass of regexps. So we would need to find a nice way to choose whichalgorithm to use when.


Best regards,

Dag Hovland

smime.p7s
Description: S/MIME Cryptographic Signature

[Prev in Thread]

Current Thread

[Next in Thread]

Regular expressions with high or nested repetition counters, Dag Hovland <=
- Re: Regular expressions with high or nested repetition counters, Paul Eggert, 2010/05/05
  - Re: Regular expressions with high or nested repetition counters, Dag Hovland, 2010/05/05

Prev by Date: Re: dfa.c problems on MirBSD...
Next by Date: [PATCH] grep: remove redundant syntax bit
Previous by thread: [PATCH] maint: remove useless #if HAVE_STDLIB_H
Next by thread: Re: Regular expressions with high or nested repetition counters
Index(es):
- Date
- Thread