[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: new changeresyntax builtin
From: |
Eric Blake |
Subject: |
Re: RFC: new changeresyntax builtin |
Date: |
Wed, 5 Jul 2006 21:53:05 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Gary V. Vaughan <gary <at> gnu.org> writes:
>
> I'm thinking of removing epatsubst, eregexp and erenamesyms in HEAD, in
> favour of a more flexible and scalable changeresyntax builtin as an
> analogue to re_set_syntax in the GNU regex C API.
>
> If a bogus operand is given:
>
> changeresyntax(`meh')
> => stdin:1: m4: ERROR: unknown argument to built-in `changeresyntax';
> use one of: AWK, ED, EGREP, EMACS, GNU_AWK, GREP, POSIX_AWK,
POSIX_BASIC, POSIX_EGREP, POSIX_EXTENDED, SED.
I like it! It is similar to the -regextype primary recently added to findutils
4.2.24. And it goes well with the regexprops-generic.texi in gnulib, which
documents all the high-level regular expression families in GNU programs. And
what about POSIX_MINIMAL_BASIC?
>
> This replaces 3 builtins with one more powerful builtin, an obvious
> win to my mind Can anyone see a downside to this change?
A few issues to be resolved, first.
One - autoconf documents m4_bpatsubst as mapping to m4's patsubst, with the
note that m4_patsubst is reserved for the day that m4 introduces epatsubst. We
need to make sure that repeated use of changeresyntax is efficient. With your
proposal, autoconf will have to do something like:
define(`m4_patsubst', `changeresyntax(`POSIX_EXTENDED')'defn(`patsubst'))
define(`m4_bpatsubst', `changeresyntax(`EMACS')'defn(`patsubst'))
(which implies that we will need to fix the mixing of text and builtins in a
single definition; or else expand the above example into using helper macros).
Two - what about case-insensitive regular expressions? Again, using findutils
as an example, it provides -regex and -iregex as the two primaries affected by -
regextype. So we should really have 7 regex builtins in m4:
patsubst, regex, renamesyms, ipatsubst, iregex, irenamesyms, changeregex.
Three - is changeresyntax(`emacs') the same as changeresyntax(`EMACS')? Should
we accept unambiguous prefixes, like changeresyntax(`em')?
Four - what should the default be? Do we stick with EMACS syntax, for 1.4.x
compatibility, or do we go for broke and make the default POSIX_EXTENDED?
Whatever we choose, we should probably also have a command-line option to set
the default.
Five - it looks like you already have a patch started. Don't forget to add the
current resyntax to frozen files, since it should be saved across loads. And
how would this interact if the state in the frozen file and the state requested
by the command line differ on reload?
Six - is it also worth adding an optional parameter to the existing regex
builtins? I'm thinking along the lines of:
patsubst(string, regexp, replacement, opt syntax)
That optional syntax parameter could also serve as the place to request flags
like case-insensitive or global vs. first match only (kind of like perl's
s///ig). Then you would only need four regex primitives (changeresyntax,
patsubst, regex, and renamesyms), because the optional syntax parameter could
double as the place to request case-insensitivity. For example, autoconf could
then do something like:
define(`m4_bpatsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', `EMACS')')
define(`m4_patsubst', `m4_builtin(`patsubst', `$1', `$2', `$3',
`POSIX_EXTENDED')')
define(`m4_ipatsubst', `m4_builtin(`patsubst', `$1', `$2', `$3',
`POSIX_EXTENDED,insensitive')')
It still makes sense to provide changeresyntax, even if you add the optional
parameter to the other three builtins, so that you don't always have to request
which syntax.
--
Eric Blake