gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] GNU Arch status update


From: Tom Lord
Subject: Re: [Gnu-arch-users] GNU Arch status update
Date: Wed, 9 Feb 2005 10:10:37 -0800 (PST)


   From: Ben Finney <address@hidden>

   On 06-Feb-2005, Tom Lord wrote:
   > There are (at most) three regexp engines in the free software world
   > with anywhere near enough performance to work for `inventory'.  Of
   > those, only 1 is in a free version of libc (and, of the three, with
   > considerable respect for its author, i must note that it is the one I
   > think most likely to have performance problems for uses within arch
   > but outside of `inventory' itself).

   > In other words, the decision to use the system regex is pragmatically
   > equivalent to the decision to rely on GNU libc.

   Is your objection based on libc's regex engine, or on other aspects of
   libc?

A little from column A and a little from column B.

One danger of having to link against a specific `libc' is that it
damages (pragmatic) portability rather severely: one is constrained
to systems to which that `libc' has been ported and on which it is
installed.

Another danger of relying on a specific `libc', especially one
seemingly designed to include as many compatability features, bells,
and whistles as possible is that *additional* dependencies on that
`libc' will creep into the application.   Those extra dependencies
are less likely to be discovered if, when they are added, nobody is
bothering any more to link against any other `libc'.


   If the dependency was on a separate-from-libc version of GNU's regex
   library, would that be satisfactory?

Certainly much closer.   That would reduce my objections to a concern 
about whether or not `arch''s regexp features are or are not constrained
to those defined by Posix.

I don't think that comitting permanently to the Posix regexp standards
is a good idea.   There are some non-standard regexp constructs and
interfaces that I think are much more practical.   There remains an
open question about how regexp use in arch will evolve as support
for extended character sets is added.

At the moment, `tla' does use only standard Posix regexp operators
and interfaces.   As a practical matter, bundling a second regexp
engine may be a very good idea -- for now.

But be aware that at least three core developers are now aware of
probable medium-term future developments in which `tla' regexp performance
is significantly improved by using some non-Posix features from Rx.

It's a nice theory (and nice in practice /when/ the theory holds true)
that Posix is the last word in regexp interfaces and implementations
are just plug-n-play.  In reality, it is difficult to find widly
ported and wildly used regexp-intensive programs which do not
undermine that theory.  For example, here is a short list of
regexp-intensive programs are bundled with customized regexp engines,
each offering non-standard features:

        GNU [ef]?grep (and some other implementations, too, I think)
        GNU awk
        GNU emacs
        Perl
        Tcl

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]