octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sed vs. gsed


From: Rik
Subject: Re: sed vs. gsed
Date: Mon, 19 Sep 2016 16:29:46 -0700

On 09/19/2016 04:05 PM, address@hidden wrote:
Subject:
Re: sed vs. gsed
From:
Mike Miller <address@hidden>
Date:
09/19/2016 11:45 AM
To:
"c." <address@hidden>
CC:
Sebastian Schöps <address@hidden>, address@hidden
List-Post:
<mailto:address@hidden>
Content-Transfer-Encoding:
8bit
Precedence:
list
MIME-Version:
1.0
References:
<address@hidden> <address@hidden> <address@hidden>
In-Reply-To:
<address@hidden>
Message-ID:
<address@hidden>
Content-Type:
text/plain; charset=utf-8
Message:
4

On Mon, Sep 19, 2016 at 20:34:55 +0200, c. wrote:
> 
> On 19 Sep 2016, at 20:13, Sebastian Schöps <address@hidden> wrote:
> 
> > Rik-4 wrote
> >> But I realize that might be too big a change.  If we stay with the current
> >> system, then why not just require the GNU variants of sed and awk?  We
> >> already require GNU Make, for example.  Basically, there's a lot of
> >> scripts
> >> that I don't feel like re-writing to remove the use of GNU regexp.  Not
> >> only would this take time away from more useful Octave tasks (since our
> >> volunteer time is finite), but it would also potentially introduce new
> >> bugs
> >> in the build system which would then need to be debugged, taking yet more
> >> developer time.
> > 
> > I agree but then configure should properly detect gawk and gsed and complain
> > if they were not found. Furthermore, it should be made sure that octave is
> > actually using the detected binaries, i.e., not some sed or awk that happens
> > to be in the path :) 
> > 
> > I think I proposed this in some of the threads on "odepkg on Mac" where I
> > came across the gawk problem (which throws at least an error message during
> > configure). 
> > 
> > Bye
> > Sebastian
> 
> Yes, 
> 
> I agree with Sebastian, it is not such a big hassle to require gsed+gawk,
> what causes a big deal of problems is that there is no check in configure to
> make sure the detected sed+awk have the required features.
> 
> So we should decide whether any sed or gawk will do or, if we require a specific
> version, we should check that version is actually available.
I'm good with that.

I would suggest (and may start working on) a test for specific features,
rather than a --version string, for example

  if test "$(echo one | $SED -e 's/\(one\|two\)/zero/')" != zero; then
    AC_MSG_ERROR(Octave can't build with this sed program, try using GNU sed)
  fi

The interesting / surprising thing about autoconf is that it prefers
"gawk" to other awk possibilities, but it looks for "sed" before "gsed".
I'll see if I can figure out how to override or supersede that builtin
logic. Otherwise, non-GNU users will always have to set SED=gsed when
building, which I don't like.
Mike,

The autoconf documentation is useful here.  In configure.ac we use AC_PROG_AWK (https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Particular-Programs.html).  According to the documentation,

— Macro: AC_PROG_AWK

Check for gawk, mawk, nawk, and awk, in that order, and set output variable AWK to the first one that is found. It tries gawk first because that is reported to be the best implementation. The result can be overridden by setting the variable AWK or the cache variable ac_cv_prog_AWK.

Using this macro is sufficient to avoid the pitfalls of traditional awk (see Limitations of Usual Tools).

For sed, we do not use AC_PROG_SED, but instead wrote our own macro OCTAVE_PROG_SED which is in m4/acinclude.m4.  The top of this macro has

dnl
dnl Find sed program.
dnl
# Check for a fully-functional sed program, that truncates
# as few characters as possible and that supports "\(X\|Y\)"
# style regular _expression_ alternation.  Prefer GNU sed if found.

Clearly, we are already trying to test for a sed which supports alternation and for some reason it is failing?

Also, if you check the Limitation of Usual Tools link, you will find:

"Portable sed regular expressions should use ‘\’ only to escape characters in the string ‘$()*.0123456789[\^n{}’. For example, alternation, ‘\|’, is common but Posix does not require its support, so it should be avoided in portable scripts. Solaris sed does not support alternation; e.g., ‘sed '/a\|b/d'’ deletes only lines that contain the literal string ‘a|b’. Similarly, ‘\+’ and ‘\?’ should be avoided."

This is a known problem.  Either we have to write all of our sed scripts in a manner that might be accepted by a Cray computer from 1960, or we require a more modern sed, preferably gsed.

--Rik


reply via email to

[Prev in Thread] Current Thread [Next in Thread]