rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re[10]: [rdiff-backup-users] Verify times increasing


From: listserv . traffic
Subject: Re[10]: [rdiff-backup-users] Verify times increasing
Date: Wed, 25 Nov 2009 11:51:50 -0800

[Inline]

> On Nov 25, 2009, at 12:25 PM, address@hidden wrote:
>> <snip explanation of how rdiff-backup works>

> Sounds good.

>> So, a --verify isn't needed to "verify" the current files. The very
>> nature of RDB is that they're exact. (provided you trust the RDB
>> protocol...which we assume.)

> OK, I can accept this (and this makes my backup time shorter, nice).

>> A --verify IS needed when you want to check an "older" version to be
>> sure that something hasn't borked your repository for an older delta
>> set. [But the "current" files are already verified, IMO]

> When and why would I ever use this? If I need to restore an old backup
> it might be nice to know that I have access to good data, but I'll  
> take whatever I can get at that point. --verify doesn't seem to be  
> very useful to do a general repository health check (bummer).

Well, the "repository" is the "current" files and then the meta-data
and rdiffs to get to previous versions of the files.

It does check the repository. When a backup is done, it stores a SHA1
hash of the "source" file.

So a --verify that completes successfully does the following:
Takes a "current" file, applies all the relevant rdiffs as the
meta-data says it should be applied. Once done, it calcs a SHA1 hash
for the "restored" file and compares it to the stored SHA1 hash of
that file when it was backed up on the relevant date.

If the two match, we know the system worked properly.

So, a -verify back to the oldest backup does do a fairly
comprehensive check - just not exhaustive. It does verify that the
meta-data/rdiffs for a lot of the system does work, and isn't
corrupt.

Again, it's not deterministic, which I'd like - but it's not half bad
either.

A totality/deterministic check would certainly be nice, but I think it's do-able
the way it is now.

>> So, your most important data the current data is verified.
>> [IMO] Progressively older delta sets are each less certain, as they
>> all get layered on top of each other in reverse order to get to
>> "older" sets. [But in general, I consider each "older" set to be
>> progressively less important - at least in general.]

> I half agree here. I certainly agree that the most important data is  
> the most current data. However, I would like to keep (at least) one  
> years worth of backup history, and I need to know that my history is  
> good.

>> So, I see your problem as the following.
>>
>> 1) Verify that the current backup completed properly.
>> (I do this via logs and exit codes. I don't "double" check the
>> current backup by doing a --verify on the current backup set. I
>> implicitly trust that RDB does it's job properly and that at the end
>> the hashes will match properly and that the current "remote" files do
>> equal the current "local" files. {i.e. the files that were the
>> source of the backup equal the backup files)

> That's very trusting of you. I guess I'm a little more paranoid since
> my job depends on it :)

Well, RDB creates SH1 hashes of both files, and then compares it and
if it's different it does all the work to be sure they're the same.

Doing another SHA1 hash compare at the end seems redundant.

Either you trust that the RDB protocol does what it says it does, or
you don't. If you don't, then don't use the tool. [I'm being a bit
bombastic, but I think you get the point...]

And doing a --verify won't get you there, since it's just "verifying"
the file (reconstructed or not) with the SHA1 hash generated by RDB
at the backup date/time. [If you don't trust RDB, then you shouldn't trust
it's stored SHA1 hash or it's verify either, K? :)]

>> 2) Verify that your older delta's are as intact as possible. That all
>> the meta-data, deltas and current files can be merged and rolled-back
>> to whatever desired end-point you want.
>>
>> (This is where I use --verify - it's not perfect because there's not
>> a way to check every delta-set for every single file in the
>> repository - at least not easily. [A recursive loop checking every
>> version would do that, but as you say, it's going to be very resource
>> expensive.])

> Agreed. This is where I'd like to see a new feature in rdiff-backup.  
> I'm willing to write code if I ever get time and no one else does first.

Agreed - a deterministic, full-repository check would be excellent!

>> 3) Verify that the data is exact from your FW800 drive to the USB
>> drive on the mac-mini.
>>
>> (I wouldn't use a --verify for this. As long as the files are equal
>> from the FW drive to the USB drive, if you can --verify on the FW  
>> drive
>> [source] you should be able to --verify on the USB drive too. So I'd
>> either "trust" rsync to be sure they're equal - or do something like
>> you are doing - checking that the FW files are exactly equal to the
>> USB files.
>>
>> I'd do a verify on the fastest drive on the most powerful system.
>> Plus you don't need to do this all the time, say once a week - over a
>> weekend probably works. [And perhaps a full recursive loop through
>> all the diffs would be possible. If you write a bash script to do
>> that, I'd love to have it!])

> The bash script would be hugely inefficient. I'd much rather spend the
> time modifying rdiff-backup support an internal consistency check.

> The problem with doing it once a week is that it only ever hits one of
> the drives that is normally in secure storage. It would be a matter of
> weeks or possibly months to make sure that all drives have been  
> verified (e.g. each time a particular drive is in use on a Friday).

>> To recap:
>> ** Trust RDB does the backup properly and that source = destination
>> without additional checks.
>>
>> ** --verify the backup repository on the FW drive, and as much as
>> possible that all the older deltas and meta-data are intact and
>> functioning properly.
>>
>> ** check that the FW drive does copy exactly to the off-site USB
>> drive - but don't use --verify to accomplish this task. Just make
>> sure that the "off-site" repository is exactly equal to the "on-site"
>> FW drive.

> I never do a direct compare between the two drives. I just use rsync  
> to copy from the FW to the USB drive. Here's my concerns: without some
> type of regularly executed integrity check of the data on the drive  
> (FW or USB), how would I detect that a drive is failing before it is  
> catastrophic and the bad data has propagated to all of the redundant  
> USB drives? Will rdiff-backup and/or rsync tell me if the drive is  
> failing when they do a backup/copy? (I don't think so) The only way  
> know that the data is good in my setup is to run some type of  
> consistency check on the USB drive each day after the rsync is  
> complete. If that fails then I know I have a problem somewhere. BTW it
> looks like yafic won't work for me now either. there seems to be a bug
> that causes it to stop half-way through the check  :(

IMO, the key piece is the FW drive and the main repository. There's
nothing that in on the USB drives that isn't on the FW drives. [i.e.
the FW drive is a superset of everything on the "off-site" usb
drives, right?]

If you can successfully verify the FW drive and keep it's verify time period
(periodicity) shorter than the time to cycle through all the
"off-site" drives, you're golden. [So, if you have a failed FW
repository you'll know before you overwrite all the "off-site"
drives.]

> So back to the drawing board (or google) to find a different utility  
> to do the integrity check.

> Thanks a lot for your input and generously patient explanations, Greg.
> I do value your input.

> ~ Daniel









reply via email to

[Prev in Thread] Current Thread [Next in Thread]