bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Use gawk and regex to stip comments from html files regex


From: Davide Brini
Subject: Re: [bug-gawk] Use gawk and regex to stip comments from html files regex problem
Date: Thu, 30 Oct 2014 11:29:23 +0100

On Wed, 29 Oct 2014 22:54:25 +0100, "frank ernest" <address@hidden> wrote:

> Hello, I'm trying to use the lazy star to strip comments from html files
> but it's not working: gawk '{ gsub(/<!--.*?-->/, "", $0); print $0}'
> I have to use a non greedy method so that this:
> <!-- comment one --> Important text <!-- comment2 -->
> does not be come this:
> 
> See? But, despite the fact that several docs on extended regexes mention
> the fact that the lazy star works it does not work in gawk. I know that I
> might use some other tool like lynx, but I wanted to do it with gawk and
> I don't see why a perfectly fine programming language should fail for so
> simple a task.

First of all, it's not at all a "simple task". Keep in mind that
parsing *ML with regex-based tools is fragile and very hard to do right.
It's much better to use dedicated tools, of which plenty exist.

That being said, try with (not guaranteed to work in 100% of the cases):

gawk -v RS='<!--|-->' 'NR%2' file.html

Optionally you can set OFS="" for a more conservative output format.

(And good luck.)

-- 
D.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]