[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] About backups and increments
From: |
Maarten Bezemer |
Subject: |
Re: [rdiff-backup-users] About backups and increments |
Date: |
Mon, 22 Aug 2011 21:26:10 +0200 (CEST) |
On Mon, 22 Aug 2011, Robert Nichols wrote:
About space requirements: I assume the space required for the backup is:
- the space of the source files themselves
- the space of all the increments
- extra space required to compute the increment?
* Is this space stored on the source or destination drive? * This should
be the size of the file currently computed + it's increments right? So
should I assume that to backup the *second* increment of some space X
(where X can possibly be just one huge file) I need at least X * 2
space for the backup - just for temporary files?
* This brings me back to my first question: what happens when the
destination is full?
I'm not aware of any extra space needed for computing the increment, but the
increment itself, of course, does need to be stored on the destination drive.
If the destination drive runs out of space, the rdiff-backup session will
fail.
If it is detected that a file has changed (based on file attributes), a
new file in the destination directory is created using a "temp name", and
it is synced to its new contents, using the old version to speed up the
rsync process. After that, an increment is created, and only then will the
old version be removed.
This process is followed sequentially for all files, so the total space
needed would be the space for the increments that are created during this
session, plus the size of the largest file in the repository.
Of course, you usually don't know in advance how large the increments will
be...
I don't really understand what you mean by 'the second increment'.
Worst case would be that you'd need the current size of the source, plus
the total size of your last backup including all increments (if everything
in the tree is replaced by something else), plus a small metadata
overhead. If you repeat for a second increment and again all data has been
replaced by other data, you would again need the current source size plus
the total size of the backup tree.
If, however, the data you backup changes only slightly or is mostly
'append-only' data like log files, each time the space used by increments
would be quite limited.
It all depends on your data set...
About backup speed. rdiff-backup doesn't seem to support both
backupping *and* pruning the increments at the same time (yes, I've
read the man page). Though this sounds like a very sensible thing to
do: knowing that you will prune several old increments, you can avoid
to calculate the reverse diffs. Has this been considered?
There's not much point in combining those two, totally independent actions.
Computing the reverse diffs for session N vs. session N-1 is totally
independent of the existence (or lack thereof) of earlier sessions in the
archive.
Adding to that:
One will always have to calculate a reverse diff to go from the newly
synced (N) version to the previous (N-1) version. If someone wants to
avoid calculating reverse diffs for a file, that is the same as having no
history at all. Better use rsync then, instead of rdiff-backup...
If you don't calculate a reverse-diff for a file, you won't be able to
regress a backup run that failed half-way through... leaving you with a
useless backup.
But!
Maybe I now know what I didn't understand in your line of questioning.
With rdiff-backup, increments are for individual files, and only when
these individual files have been changed. So, there are no reverse diffs
if a file has not been changed. For a data set of 1000 files with only 10
files changing since the previous run, the increments dir would only
contain 10 reverse diff files for this run.
Likewise, if a file hasn't been changed for 3 months and it is changed
today, but I only want to keep 1 month of history, I can NOT simply ditch
the 3-months old version. Maybe it wasn't changed for all these months,
but it is still yesterday's version and has to be kept in history for the
coming month minus 1 day...
--keep-increments N (where N is the number of most recent increments to
keep, irregardless of time).
[snip]
Let's say I want always to keep at all times at least 2 increments (or
2 months, if that matters), I have no way to do that directly (I could
list the increments and calculate the time myself, but that's ugly).
So.. lets assume you make weekly backups. (Hoping it will be more often,
but just as an example.)
You want to keep history of 2 months. That's about 8 or 9 weeks.
But sometimes you make an extra backup halfway through a week, and
sometimes you go on a vacation and don't run any backup.
So, in these cases, you might want to keep history for 2 months, but also
at least 5 increments, even if that means it will be more than 2 months?
Would it really be useful to.. eh.. keep increments from 4 months ago if
you forgot to run backups for the last 2 months? This sounds just like
"oh, I didn't make backups over the last two months, but I do happen to
have some historic versions from 3 months ago containing your PhD thesis
you've been working on... for the last 3 months....."
Let's just say that I don't think having such an option would be a really
nice thing to have ;-)
And creating a small script would indeed be far easier ;-)
Side note: I never automate the removal of old increments. Always do that
by hand, first without --force to check the increment dates it announces
that will be removed, then with --force if it looks OK. The only thing
that's automated wrt increment removal is a cron job reminding me of the
task. I could even modify it to remind me daily if increment removal is
due and wasn't done yet, but for now, I keep these reminders in my inbox
until the removal is done.
--
Maarten