gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: tla1.1 plans


From: Tom Lord
Subject: Re: [Gnu-arch-users] Re: tla1.1 plans
Date: Tue, 16 Sep 2003 08:19:39 -0700 (PDT)

    > From: Miles Bader <address@hidden>

    > "Stephen J. Turnbull" <address@hidden> writes:
    > >     Miles>   junk \.foo$
    > >     Miles>   junk \.bar$

    > >     Miles> ... and have them automatically concatenated into
    > >     Miles> `\.foo$|\.bar$';
    > > Except that you don't want to actually concatenate them (except as an
    > > optimization); it should be possible to interleave them with other 
types.

    > Do you mean that it should match the the patterns in the order they
    > occur in the file, returning the first type that matches?  Does
    > =tagging-method do that now?

No, though that is one aspect of the generalization proposed in =TODO.

    > Another thing to wonder about for concatenation is whether rx can
    > optimize the resulting expressions, in particular, whether it can factor
    > the trailing `$' out of the or-expression (if not, it would be simple
    > enough to heuristically optimize a few simple cases resulting from
    > concatenation before calling rx).

Rx does optimize the expression that way.   In other words, aside from
from subpattern reporting:

     ^(a|b|c)$
and 

    ^a$|^b$|^c$

are operationally equivalent (and since subpattern reporting isn't
used in this application, the two are operationally equivalent for
`inventory' purposes).

It's also possible to treat ordered patterns "like lex".   Thus:

        junk A
        source B
        junk C

        =>

        A[[:cut 3:]]|B[[:cut 2:]]|C[[:cut 1:]]

returns an extra value (the "final state label") return 3 if A
matched, 2 if B matched, 1 if C matched.  If it's ambiguous -- say,
both A and C matched -- the label with greater magnitude is returned
(3).

It is one of the points of the full =tagging-method generalization
that all of the patterns (all categories, in whatever order, matching
relative paths or basenames) could be compiled into one big pattern
that must match the entire relative path.

Moreover, if we restricted the pattern language to prohibit
backreferences and counted iterations, there'd be two consequences:

1) it would be guaranteed that matches would take only a single pass
   over each relative path

2) actually, it would take less than a single pass over each relative
   path.  During traversal, a partial match of the directory-part of a
   relative path name can be computed just once for each directory.
   The resulting state machine state can be cheaply cloned for each
   file in the directory which is then matched with a single pass over
   only the basename.

Rx has features for nearly all of that already.  

The biggest new part that would be needed is the part that compiles
the =tagging-directives into "one big regexp".  That's
straightforward, conceptually, but has enough nebbishly little details
to make it a pain in the neck.

The other biggest new part would be gutting the big loop in
`inventory_traversal', of course.

Another interesting consequence of going down that path, though, is
that the result would quickly turn into "all implementations of arch
need to use Rx or an off-line lexical analyzer compiler unless they
want to be painfully slow."    

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]