rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Resource Forks and metadata file


From: Daniel Hazelbaker
Subject: [rdiff-backup-users] Resource Forks and metadata file
Date: Fri, 28 Nov 2003 13:42:38 -0800
User-agent: Microsoft-Entourage/10.1.4.030702.0

Greetings.

    I just ran rdiff-backup on our full file server for the first time to
begin a nightly backup routine. The first ran went perfectly.  The second
run went perfectly but took a while. The third run I am watching and it has
been processing the metadata file(s) for over half an hour.  The reason is
that there is so much resource fork data python is overloading itself
loading all of that information before it is needed.

    Here are some stats.

Source FS: HFS+, 238GB used (mirrored 250GB, FW800)
Dest FS: HFS+ 488GB total (striped 250GB, FW800)

A sample of the mirror metadata is as follows. From the second backup the
.snapshot.gz metadata file is 7,671,808 bytes. Uncompressed it is 26,968,064
bytes. Greped for just ResourceFork lines it is 26,853,164 bytes.  That is
just the small "diff" file.  The full metadata file of the first snapshot is
817MB compressed.  I'm not even going to bother trying to uncompress it, I
am sure I will get about the same ratios.

Here is the question. Is there a way to index this file (store the offset
into the file say, and if we need the resource fork later go back and load
it) or some other method of "on demand" loading of the fork data out of the
metadata file? I think even if I had to store the snapshot files
uncompressed I could live with something like that.  The other option I can
think of is to store (again with some kind of index) the actual resource
fork data in a separate metadata file and again load on demand. 99% of the
files with resource forks are things like Entourage color information files.
They will never change and never need to be diff'd because the dates & size
will match.  There has to be a way to speed this up...

Also, I noticed that on Mac OS X the current beta version does not properly
account for fs_abilities on any command other than backup. Example:
I run a backup operation, it works fine.
I run a --remove-older-than operation, it cannot find valid
rdiff-backup-data directory (because it doesn't understand to "un-escape"
the filenames)
Same results if I run any of the list commands.

I wrote a simple hack patch that I am currently using that runs the proper
fs_abilities before executing the bulk of those commands.  I don't recommend
it's use at all as I just threw it together and it is nowhere up to coding
standards.  If you do want it Ben, let me know and I'll drop it to you. I
basically just cut&pasted the needed method calls rather than doing it the
"proper" way and make a support method that does it for all list functions
etc.

As soon as this operation finishes I will post the stats, just FYI, of how
long it took and what it did.

Daniel Hazelbaker
High Desert Church





reply via email to

[Prev in Thread] Current Thread [Next in Thread]