help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pcre in gawk?


From: Ben Bacarisse
Subject: Re: pcre in gawk?
Date: Fri, 28 Jul 2023 04:03:30 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Miriam English <mim@miriam-english.org> writes:

> Thanks for your reply. Apologies for my lack of response.
>
> I have been trying to get pcre working with gawk, but so far I've been
> unsuccessful.
>
> Instead, I've fallen back on using pretty simple hacks to do the job.
> For example, to do a non-greedy match the way .*? would do, I've broken
> it into a two-step process, replacing the first instance of the target
> string with a character that's extremely unlikely to be found in any
> text -- hexadecimal character 1b is from the old ASCII character set
> and represented a character to be substituted, so is fitting, I think.
> Then do a normal greedy replace.
>
> echo "1 and 2 and 3" | awk '{sub(/and/,"\x1b");sub(/.*\x1b/,"")}1'
>
> results in:
>  2 and 3
>
> The only potential problem I see is in text that contains multi-byte
> characters where one of the bytes is x1b. I'm not sure how to get
> around that, or if there is a way to protect against it.

gawk can handle strings with embedded zeros.  If you use \x0 rather than
\x1b you can be sure it is not contained in any valid UTF-8 string.

> Alternatively, a rare unicode character, for example a cuneiform
> character, could be used for the substitute character.

-- 
Ben.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]