rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Resource Forks and metadata file


From: Ben Escoto
Subject: Re: [rdiff-backup-users] Resource Forks and metadata file
Date: Fri, 28 Nov 2003 17:45:39 -0800

On Fri, 28 Nov 2003 13:42:38 -0800 Daniel Hazelbaker <address@hidden> wrote:
> A sample of the mirror metadata is as follows. From the second backup the
> .snapshot.gz metadata file is 7,671,808 bytes. Uncompressed it is 26,968,064
> bytes. Greped for just ResourceFork lines it is 26,853,164 bytes.  That is
> just the small "diff" file.  The full metadata file of the first snapshot is
> 817MB compressed.  I'm not even going to bother trying to uncompress it, I
> am sure I will get about the same ratios.

Hmm, so the resource fork data is 99%+ of the total?  When you added the
resource fork support I assumed that not many files would have resource
forks, and that they would be pretty small.

> Here is the question. Is there a way to index this file (store the offset
> into the file say, and if we need the resource fork later go back and load
> it) or some other method of "on demand" loading of the fork data out of the
> metadata file? I think even if I had to store the snapshot files
> uncompressed I could live with something like that.  The other option I can
> think of is to store (again with some kind of index) the actual resource
> fork data in a separate metadata file and again load on demand. 99% of the
> files with resource forks are things like Entourage color information files.
> They will never change and never need to be diff'd because the dates & size
> will match.  There has to be a way to speed this up...

Yes, I guess it makes sense to store it in separate files(s).  ACLs and
EAs are already stored like this, so it wouldn't be that big a deal to
have a separate resource_forks.<...> file.  However, such a file would
have to be recreated each session, so if it's big then a lot of time
would be wasted reading/writing it and lots of space would be wasted
saving old copies.

Extrapolating, you have about 3GB of resource forks, does that sound
right?  Maybe then we need a separate resource_fork_dir.<time>/
directory, which has the same directory structure as the mirror
directory, but just contains resource forks.  So file foo/bar would have
its resource fork stored in
rdiff-backup-data/resource_fork_dir.<time>/foo/bar.  If a file's
resource fork hasn't changed we could take a page from the rsync people
and hard link the old file to the new location.  This should be easier
than recreating the whole .snapshot + .diff scheme.

> Also, I noticed that on Mac OS X the current beta version does not properly
> account for fs_abilities on any command other than backup. Example:
> I run a backup operation, it works fine.
> I run a --remove-older-than operation, it cannot find valid
> rdiff-backup-data directory (because it doesn't understand to "un-escape"
> the filenames)
> Same results if I run any of the list commands.

By "current beta version" do you mean 0.13.3 or CVS?  There is a fix in
CVS over 0.13.3 for filename quoting, which (as you noticed) was broken
when running remotely.  I don't know if it fixes the other fs_abilities
options.


-- 
Ben Escoto

Attachment: pgpZIXfzv2h_P.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]