rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re[8]: [rdiff-backup-users] Verify times increasing


From: Daniel Miller
Subject: Re: Re[8]: [rdiff-backup-users] Verify times increasing
Date: Wed, 25 Nov 2009 14:12:37 -0500


On Nov 25, 2009, at 12:25 PM, address@hidden wrote:
<snip explanation of how rdiff-backup works>

Sounds good.

So, a --verify isn't needed to "verify" the current files. The very
nature of RDB is that they're exact. (provided you trust the RDB
protocol...which we assume.)

OK, I can accept this (and this makes my backup time shorter, nice).

A --verify IS needed when you want to check an "older" version to be
sure that something hasn't borked your repository for an older delta
set. [But the "current" files are already verified, IMO]

When and why would I ever use this? If I need to restore an old backup it might be nice to know that I have access to good data, but I'll take whatever I can get at that point. --verify doesn't seem to be very useful to do a general repository health check (bummer).

So, your most important data the current data is verified.
[IMO] Progressively older delta sets are each less certain, as they
all get layered on top of each other in reverse order to get to
"older" sets. [But in general, I consider each "older" set to be
progressively less important - at least in general.]

I half agree here. I certainly agree that the most important data is the most current data. However, I would like to keep (at least) one years worth of backup history, and I need to know that my history is good.

So, I see your problem as the following.

1) Verify that the current backup completed properly.
(I do this via logs and exit codes. I don't "double" check the
current backup by doing a --verify on the current backup set. I
implicitly trust that RDB does it's job properly and that at the end
the hashes will match properly and that the current "remote" files do
equal the current "local" files. {i.e. the files that were the
source of the backup equal the backup files)

That's very trusting of you. I guess I'm a little more paranoid since my job depends on it :)

2) Verify that your older delta's are as intact as possible. That all
the meta-data, deltas and current files can be merged and rolled-back
to whatever desired end-point you want.

(This is where I use --verify - it's not perfect because there's not
a way to check every delta-set for every single file in the
repository - at least not easily. [A recursive loop checking every
version would do that, but as you say, it's going to be very resource
expensive.])

Agreed. This is where I'd like to see a new feature in rdiff-backup. I'm willing to write code if I ever get time and no one else does first.

3) Verify that the data is exact from your FW800 drive to the USB
drive on the mac-mini.

(I wouldn't use a --verify for this. As long as the files are equal
from the FW drive to the USB drive, if you can --verify on the FW drive
[source] you should be able to --verify on the USB drive too. So I'd
either "trust" rsync to be sure they're equal - or do something like
you are doing - checking that the FW files are exactly equal to the
USB files.

I'd do a verify on the fastest drive on the most powerful system.
Plus you don't need to do this all the time, say once a week - over a
weekend probably works. [And perhaps a full recursive loop through
all the diffs would be possible. If you write a bash script to do
that, I'd love to have it!])

The bash script would be hugely inefficient. I'd much rather spend the time modifying rdiff-backup support an internal consistency check.

The problem with doing it once a week is that it only ever hits one of the drives that is normally in secure storage. It would be a matter of weeks or possibly months to make sure that all drives have been verified (e.g. each time a particular drive is in use on a Friday).

To recap:
** Trust RDB does the backup properly and that source = destination
without additional checks.

** --verify the backup repository on the FW drive, and as much as
possible that all the older deltas and meta-data are intact and
functioning properly.

** check that the FW drive does copy exactly to the off-site USB
drive - but don't use --verify to accomplish this task. Just make
sure that the "off-site" repository is exactly equal to the "on-site"
FW drive.

I never do a direct compare between the two drives. I just use rsync to copy from the FW to the USB drive. Here's my concerns: without some type of regularly executed integrity check of the data on the drive (FW or USB), how would I detect that a drive is failing before it is catastrophic and the bad data has propagated to all of the redundant USB drives? Will rdiff-backup and/or rsync tell me if the drive is failing when they do a backup/copy? (I don't think so) The only way know that the data is good in my setup is to run some type of consistency check on the USB drive each day after the rsync is complete. If that fails then I know I have a problem somewhere. BTW it looks like yafic won't work for me now either. there seems to be a bug that causes it to stop half-way through the check :(

So back to the drawing board (or google) to find a different utility to do the integrity check.

Thanks a lot for your input and generously patient explanations, Greg. I do value your input.

~ Daniel





reply via email to

[Prev in Thread] Current Thread [Next in Thread]