rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Resource Forks and metadata file


From: Ben Escoto
Subject: Re: [rdiff-backup-users] Resource Forks and metadata file
Date: Fri, 28 Nov 2003 23:01:33 -0800

On Fri, 28 Nov 2003 20:47:37 -0800
Daniel Hazelbaker <address@hidden> wrote:
>     After 9 hours of it still trying to parse the metadata file and still
> getting to the actual backup stage I finally killed it.

Hmm, are you sure it wasn't in the actual backup stage?  There is no
separate "parse the metadata stage"; it parses it as it goes along.

>     A quick look at resource fork data size shows 1.2GB of actual RF data.
> Doing some greps on that about 200MB of that is old OS 9 stuff. Doing some
> sorting the top 500 files of 93,000 comprise about another 300MB of RF data
> (most of these are things like Carbon games, apps, etc. not actual "data"
> files). My rough average, figuring 800MB of useful RF data, divided by the
> number of files on the server gives me approx 8K of RF data per file.
...
>     Ben, unless you can/want to whip out a patch to change the RF storing
> style to another method (you mentioned some ways that ACL's are done), could
> you point me in the direction of a few functions I can start looking at for
> doing that, as well as your thoughts on the best method to store the RF
> data?

I think the first step is to figure out what is taking so long.  It
could be something simple like the ascii->binary converter for the
metadata file is obscenely slow.  Perhaps you could use the python
profiler?  I would tell you how to do this but in fact I have totally
forgotten how to use it.  Fortunately it's documented in the python
library reference.

>     I wonder, would there be another option to use, like bdb or something
> like that which might make more sense? Thinking out loud...

Possibly.  For metadata I remember trying out gdbm and concluding that
flat text files were better, but this is a different situation.  The
problem with any single file scheme is that all the data has to be
repeated each time, or else we have to use some diffing scheme which
could get complicated.

So each session would take up an extra 1.2GB.  That sounds like a lot
(0.5% overhead) even when backing up 250GB.  But it could be
insignificant, depending on what the turnover for the other files is
like.


-- 
Ben Escoto

Attachment: pgpIloi95DO9f.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]