help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pcre in gawk?


From: Z
Subject: Re: pcre in gawk?
Date: Mon, 10 Jul 2023 22:25:25 +0000
User-agent: mail v14.9.24

Miriam English <mim@miriam-english.org> wrote:

> Does anybody know if there is any way to use the pcre library to get a
> more extensive regex with gawk? In particular I want to use
> non-greedy matches, such as:
> 
> *?        Match 0 or more times, not greedily
> +?        Match 1 or more times, not greedily
> ??        Match 0 or 1 time, not greedily
> {n}?      Match exactly n times, not greedily (redundant)
> {n,}?     Match at least n times, not greedily
> {n,m}?    Match at least n but not more than m times, not greedily

I don't think there is (yet) a gawk extension library for PCRE but there
is for TRE, sort of a fuzzy match: https://laurikari.net/tre/faq/

GNU grep has a '-P' option that I guess mostly provides PCRE matching
if PCRE is available on the system.  You could make a user-defined function
to use it:

--
# pcre_grep.awk -- use grep -P for pcre-type regex
{
  # match 'b' at least twice but no more than 3 times:
  pcre("b{2,3}?",$0)
}
function pcre(PAT,REC,  CMD) {
  CMD = "echo '"REC"' |grep -P '"PAT"'"
  while (CMD |getline == 1)
    print $0
  close(CMD)
}
--

This is probably not the best approach but it seems to work.

The pcre2-utils package comes with a 'pcre2-test' tool that might be
used similarly.  I think the default behavour is "non-greedy" unless
PCRE_UNGREEDY is set.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]