On 2/15/24 09:47, Dominic Raferd wrote:
...snip...
So the only way to be confident about *all* the data in a repository
is to use 'rdiff-backup verify' to verify each and every backup
session in each repository; and this includes verifying the current
'mirror' session (even though it is held in the clear in the
repository). This needs to be done with reasonable frequency to
ensure that backed-up data has not deteriorated (e.g. through media
bitrot).
That's the way I do it. My verification is done in conjunction with my
periodic (~weekly) sync of my primary backup archives to separate
media. I verify all of the new levels that are being synced plus at
least one more level to ensure that the new levels mesh properly with
the ones already synced.
All of which takes a lot of computing power and time, much of which
is duplication of effort (because, as stated above, the verification
of the earliest session in a repository confirms the integrity of all
later versions of files that it contains, but it is not possible to
exclude these files from re-verification for more recent sessions).
Actually that is not sufficient to verify the intermediate levels.
Let's say one block of a reverse-diff file for backup level -3 gets
corrupted. That's going to cause a verification failure for level -3.
But, if a diff for level -5 replaces that same block in the file, then
level -5 and all previous levels will verify correctly. Only levels -3
and -4 will fail.
There is no substitute for verifying each and every level of the
backup archive. I have a script that does verification of 8 levels in
parallel on a system with a lot of memory. Because those threads are
for the most part all reading the same files, all but the first get
that data from the kernel's buffer cache and do not incur any I/O
delay. I find that 8 threads in parallel execute almost as fast as a
single thread. I have 64GB of RAM to play with, and my machine isn't
doing much else while I'm sync-ing backups, so YMMV. Trying to do this
on a Raspberry PI would be an entirely different story.