[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #38594] Poor grep performance for long regexp compared to performan
From: |
Jaroslav Škarvada |
Subject: |
[bug #38594] Poor grep performance for long regexp compared to performance with -P option |
Date: |
Tue, 26 Mar 2013 07:46:57 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0 |
URL:
<http://savannah.gnu.org/bugs/?38594>
Summary: Poor grep performance for long regexp compared to
performance with -P option
Project: grep
Submitted by: yarda
Submitted on: Tue 26 Mar 2013 07:46:55 AM GMT
Category: None
Severity: 3 - Normal
Item Group: None
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
This was originally reported in:
http://bugzilla.redhat.com/show_bug.cgi?id=875131
There's huge gap between performance of grep and grep -P for certain regular
expressions.
Steps to Reproduce:
1.
PATTERN="^.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
00000000000"
INPUTLINE="..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
00000000000"
for i in `seq -w 1 10000`; do echo ${i}${INPUTLINE}${i} >> /tmp/input ;done
2. time grep -P -v "$PATTERN" /tmp/input
3. time grep -v "$PATTERN" /tmp/input
4. time grep -v "^.\{1143\} 0\{11\}" /tmp/input
5. time grep -P -v "^.{1143} 0{11}" /tmp/input
4. export LANG=C
5. repeat 2., 3.
6. export LANG=en_US.iso88591
7. repeat 2., 3.
Actual results:
grep -P is 300-7000x faster than without -P option (for all combinations of
LANG and usage of $PATTERN or "^.\{1143\} 0\{11\}", resp. "^.{1143} 0{11}"
with -P).
Expected results:
performance of grep is comparable when using the same pattern.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?38594>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #38594] Poor grep performance for long regexp compared to performance with -P option,
Jaroslav Škarvada <=