gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Slow inventories on large source trees


From: Tom Lord
Subject: Re: [Gnu-arch-users] Slow inventories on large source trees
Date: Wed, 21 Apr 2004 11:42:39 -0700 (PDT)

    > From: Aaron Bentley <address@hidden>

    > > 1) measure the speed of GNU `find' on this tree with 

    > >         find . '!' -uid 0

    > >    (I'm assuming that root does not own any files in this tree.  The
    > >     find expression is to force `find' to stat files.)

    > Is that necessary?  With names tagging, it shouldn't need to stat 
    > anything, should it?

Yes, it's necessary.   `find' can sometimes get by with just a `chdir'
that might fail but `inventory' can not.

    > [other message]

    > It's calling filename_matches 89750 times -- about 21 times per file in 
    > the Wine tree, so I suspect that can be reduced, hopefully to single 
digits.

You elsewhere mentioned that that's a `changes' profile, not
`inventory' -- so you're counting _2_ inventories.   The actual
average is about 10.5 times per file.

You elsewhere mentioned something about ~230 .arch-inventory files, so
I'm assumeing nearly every directory has one.

Currently, the tests performed by a traversal during `changes', for a
directory containing a .arch-inv file is:

is it a control file?
(_NOT_ is it user-defined exclude?)
is it .arch-inv junk?           
is it .arch-inv backup?         
is it .arch-inv precious?       
is it .arch-inv unrec?          
is it .arch-inv junk?           
is it .arch-inv source?         
is it junk?                     
is it backup?                    
is it precious?
is it unrec?
is it source?


That's 12 per file except it's truncated for non-source files and
fewer for dirs lacking .arch-inventory.  So, 10.5 makes perfect sense.

The last five calls can be replaced with a single call.

The six .arch-inv calls can be replaced with a single call.

That will mean that this example goes from average case 10.5 to
worst-case 3 calls to filename_matches.

Abently, are you interested in working on this?  Do you know about the
`cut' operator in Rx?  (I.e., you don't want to combine those
filename_matches calls by adding parentheses to regexps.  You want to
arrange for the final state label of the dfa to tell you which pattern
matched.)

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]