bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: config files substitution with awk


From: Ralf Wildenhues
Subject: Re: config files substitution with awk
Date: Wed, 6 Dec 2006 14:38:42 +0100
User-agent: Mutt/1.5.13 (2006-11-01)

* Pascal Bourguignon wrote on Wed, Dec 06, 2006 at 11:00:51AM CET:
> Ralf Wildenhues <address@hidden> writes:
> >
> >   s/@var1@/@|#_!!_#|var2@/g
> >   s/@var2@/text2/g
> >   ...
> >   s/|#_!!_#|//g

> Yes.  I was pointing to the semantics of sed, not to the restricted
> usage autoconf needs.  With this later s/|#_!!_#//g we still have the
> problem, if we must use a generic sed.  IMO, it would be better to use
> a specific tool, since it could be easily implemented in
> O(length(input-file)), and wouldn't even need to implement
> sophisticated DFA at all (given the @..@ convention).

I'm not sure if we're talking past each other or simply in violent
agreement.  My reasoning is as follows:

- problem: Autoconf-generated config.status scripts are slow for large
  packages.
- analysis: it uses a suboptimal sed-based algorithm for substitution.
- Any solution to the problem must be extremely portable, so it should
  adhere to POSIX, the GNU Coding Standards, and also take into account
  further known limitations of real-world systems (the Autoconf manual
  has a guide for portability issues).
- awk is both portable, available everywhere, and allows for a better
  algorithm: we can exploit the hashing that is used in array index
  lookup.
- According to Paul, it's ok to assume (ancient V7) awk.
- result: use portable awk to accomplish the same task.
  Fix the GNU Coding Standards to allow awk, so we comply.
- outlook: more modern awk could allow for an algorithm with even
  better asymptotic scaling, as outlined in [1].  But for real-world
  configure scripts, this step doesn't seem necessary yet.

So are you now saying this job can be done even better, without
resorting to awk?  If yes, details please?

Note that the current code doesn't use a regex engine _at all_.  It
simply splits the input on `@' characters.  A splitted string between
two such characters is then used for index lookup in an awk array.
Depending on the index set I of the array and the quality of the awk
implementation, this typically costs either log(|I|) or constant time.
The latter would correspond to your O(length(input-file)).  In practice,
the difference is largely lost in the noise.

But maybe you're just misreading here that Autoconf falls back on
generic sed: it does not.  That was merely a suggestion of mine
for the bootstrapping of the one specific package named GNU gawk.
Paul argued that this is not necessary.  So I'm done.

Paolo already addressed the proposal of adding optimization to GNU sed.

Cheers,
Ralf

[1] http://lists.gnu.org/archive/html/autoconf-patches/2006-11/msg00049.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]