[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: pcre in gawk?
From: |
Z |
Subject: |
Re: pcre in gawk? |
Date: |
Mon, 10 Jul 2023 22:25:25 +0000 |
User-agent: |
mail v14.9.24 |
Miriam English <mim@miriam-english.org> wrote:
> Does anybody know if there is any way to use the pcre library to get a
> more extensive regex with gawk? In particular I want to use
> non-greedy matches, such as:
>
> *? Match 0 or more times, not greedily
> +? Match 1 or more times, not greedily
> ?? Match 0 or 1 time, not greedily
> {n}? Match exactly n times, not greedily (redundant)
> {n,}? Match at least n times, not greedily
> {n,m}? Match at least n but not more than m times, not greedily
I don't think there is (yet) a gawk extension library for PCRE but there
is for TRE, sort of a fuzzy match: https://laurikari.net/tre/faq/
GNU grep has a '-P' option that I guess mostly provides PCRE matching
if PCRE is available on the system. You could make a user-defined function
to use it:
--
# pcre_grep.awk -- use grep -P for pcre-type regex
{
# match 'b' at least twice but no more than 3 times:
pcre("b{2,3}?",$0)
}
function pcre(PAT,REC, CMD) {
CMD = "echo '"REC"' |grep -P '"PAT"'"
while (CMD |getline == 1)
print $0
close(CMD)
}
--
This is probably not the best approach but it seems to work.
The pcre2-utils package comes with a 'pcre2-test' tool that might be
used similarly. I think the default behavour is "non-greedy" unless
PCRE_UNGREEDY is set.