[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: |
Dimitry Andric |
Subject: |
bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate |
Date: |
Mon, 27 Mar 2023 19:54:27 +0200 |
On 27 Mar 2023, at 11:14, Koen Claessen <koen@chalmers.se> wrote:
>
> Running the command:
>
> echo a | grep -E -w '((()|a)|())*'
>
> does not terminate, and uses a LOT of processor time, for all versions of
> grep I have tried.
>
> This is the smallest case that could be found; simplifying anything in the
> input and/or expression leads to correct behavior.
Reproducible with GNU grep 3.7 on Ubuntu 22:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E
-w ((()|a)|())*
It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3),
and it is that implementation which ends up in an endless loop.
It loops between lines 1415, 1417 and 1443, but idx and cur_node never change
from 0:
1378 static reg_errcode_t
1379 __attribute_warn_unused_result__
1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t
nmatch,
1381 regmatch_t *pmatch, bool fl_backtrack)
1382 {
...
1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
1416 {
1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
1418
1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
...
1442 /* Proceed to next node. */
1443 cur_node = proceed_next_node (mctx, nmatch, pmatch,
prev_idx_match,
1444 &idx, cur_node,
1445 &eps_via_nodes, fs);
1446
1447 if (__glibc_unlikely (cur_node < 0))
...
1465 }
1466 }
-Dimitry
P.S.: Interestingly this does not reproduce with BSD grep, which returns
immediately with "a".
signature.asc
Description: Message signed with OpenPGP