bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Excluding directories based on content


From: Dr. David Alan Gilbert
Subject: Re: [Bug-tar] Excluding directories based on content
Date: Sun, 19 Nov 2006 01:34:31 +0000
User-agent: Mutt/1.5.9i

Joerg Schilling wrote:

> "Dr. David Alan Gilbert" <address@hidden> wrote:
> 
> > Hi,
> >   Is there a way to exclude a directory based on the content
> > of that directory?  In particular I'm after a way to say
> > 'don't back up a directory if it contains a file called .....'
> > (where ..... is a regexp or at least a list of a few filenames).
> > I see there is support explicitly for a cache dir exclusion,
> > but I was looking for something a little more generic?
> 
> In general this is impossible or hard to implement because this would
> need to scan a directory "in advance".

I can see that would be needed if regexp was done, but for just fixed
exclude names it could be implemented at the moment in the same way
that cache dir exclusion is done.  So without regexp I think if I
were to create a list of names to avoid then I could check to see
if that file existed for each file in dump_dir0 at the same point it
currently calls check_cache_directory; I would just attempt to stat the
dirname/excludename and see if it exists and if so skip the directory.
It looks reasonably simple to do.

> It you believe that this could be implemented using find(1), get star:
> 
> ftp://ftp.berlios.de/pub/star/alpha/
> 
> try:
> 
> star -c f=/tmp/some.tar -find <find expression>

I can't immediately see how to do this with a find expression since I
can't see how to test for the existance of a file within the directory
(if the current filename passed to the find parser is the directory).

Our current solution for this is to run a find on the directory searching
for the exclusion markers (which are actually just two filenames)
and with a little seding to cut those names off the output, generate
an exclusion list of directories for tar (we also add a starter list
of a few patterns to exclude core's and the like).  The problems with
this are that firstly the extra traversal of the filesystem tree is
slow, and secondly some of the filesystems I backup like this contain
many hundreds (possibly thousands?)  of exclude markers, and the tar is
running excrutiatingly slowly (in this case about 7 hours). (To be fair,
I haven't profiled it to see if it is the matching code or not)

Dave
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    | Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]