[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] Resource Forks and metadata file
From: |
Ben Escoto |
Subject: |
Re: [rdiff-backup-users] Resource Forks and metadata file |
Date: |
Fri, 28 Nov 2003 17:45:39 -0800 |
On Fri, 28 Nov 2003 13:42:38 -0800 Daniel Hazelbaker <address@hidden> wrote:
> A sample of the mirror metadata is as follows. From the second backup the
> .snapshot.gz metadata file is 7,671,808 bytes. Uncompressed it is 26,968,064
> bytes. Greped for just ResourceFork lines it is 26,853,164 bytes. That is
> just the small "diff" file. The full metadata file of the first snapshot is
> 817MB compressed. I'm not even going to bother trying to uncompress it, I
> am sure I will get about the same ratios.
Hmm, so the resource fork data is 99%+ of the total? When you added the
resource fork support I assumed that not many files would have resource
forks, and that they would be pretty small.
> Here is the question. Is there a way to index this file (store the offset
> into the file say, and if we need the resource fork later go back and load
> it) or some other method of "on demand" loading of the fork data out of the
> metadata file? I think even if I had to store the snapshot files
> uncompressed I could live with something like that. The other option I can
> think of is to store (again with some kind of index) the actual resource
> fork data in a separate metadata file and again load on demand. 99% of the
> files with resource forks are things like Entourage color information files.
> They will never change and never need to be diff'd because the dates & size
> will match. There has to be a way to speed this up...
Yes, I guess it makes sense to store it in separate files(s). ACLs and
EAs are already stored like this, so it wouldn't be that big a deal to
have a separate resource_forks.<...> file. However, such a file would
have to be recreated each session, so if it's big then a lot of time
would be wasted reading/writing it and lots of space would be wasted
saving old copies.
Extrapolating, you have about 3GB of resource forks, does that sound
right? Maybe then we need a separate resource_fork_dir.<time>/
directory, which has the same directory structure as the mirror
directory, but just contains resource forks. So file foo/bar would have
its resource fork stored in
rdiff-backup-data/resource_fork_dir.<time>/foo/bar. If a file's
resource fork hasn't changed we could take a page from the rsync people
and hard link the old file to the new location. This should be easier
than recreating the whole .snapshot + .diff scheme.
> Also, I noticed that on Mac OS X the current beta version does not properly
> account for fs_abilities on any command other than backup. Example:
> I run a backup operation, it works fine.
> I run a --remove-older-than operation, it cannot find valid
> rdiff-backup-data directory (because it doesn't understand to "un-escape"
> the filenames)
> Same results if I run any of the list commands.
By "current beta version" do you mean 0.13.3 or CVS? There is a fix in
CVS over 0.13.3 for filename quoting, which (as you noticed) was broken
when running remotely. I don't know if it fixes the other fs_abilities
options.
--
Ben Escoto
pgpZIXfzv2h_P.pgp
Description: PGP signature