[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Duplicity-talk] are periodic full backups necessary?
From: |
Peter Schuller |
Subject: |
Re: [Duplicity-talk] are periodic full backups necessary? |
Date: |
Sun, 20 Jan 2008 07:57:35 +0100 |
User-agent: |
KMail/1.9.7 |
> I'm trying to get a good understanding of the tradeoffs between
> rdiff-backup and duplicity. One of the nice things about rdiff-backup
> is that the "most recent and backward diffs" means you never need to
> do a "full backup".
Agreed.
> What are the consequences of never doing a full backup with
> duplicity's "original and forward diffs" format? Will the time
> required for an incremental backup increase in proportion to how many
> incrementals there have been since the last full backup? Or is the
> backup time independent of how "far back" the most recent full backup
> was?
I believe the fundamental problem is that in order to generate mirror n, with
a patch with differences between n and n - 1, you need access to the copy as
it appears at n - 1 and n.
In duplicity's case, that would mean duplicity would have to perform a full
restore prior to generating another forward diff and the new version of the
file.
That is assuming the current archive format of duplicity. If duplicity were to
try to implement an rdiff-backup style system with an up-to-date copy +
reverse diff, the situation would be different but still problematic:
rdiff-backup can do what it does efficiently because it is running on boths
ends of the pipe. It does not need to transfer the entire file (n - 1, nor n)
in either direction thanks to its use of the rsync algorithm.
In addition, even if one were to accept that duplicity needed to run on the
remote end, this has security implications - the entire point is that the
process running on the remote end has access to the file being diffed, and if
you do not encrypt, and you are running a remote process, then what was the
point of using duplicity instead of rdiff-backup to begin with?
Perhaps a compromise is possible whereby each individual file being backed up
is separately backed up and encrypted, such that the rsync algorithm can be
applied even with an untrusted remote system on the encrypted data. However
this also has security implications since you can make more determinations on
the number of files, their sizes, the distribution of changes over time and
so on than you can do with the volume uploads.
Perhaps a single large "virtual" volume could be generated for a complete
backup, would would then be used to apply the rsync algorithm to the previous
volume. This assumes the rsync algorithm only requires one pass (does it?)
and that it will all work well even in the face of large displacement of data
in this huge file (probably not).
Anyone have better ideas?
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <address@hidden>'
Key retrieval: Send an E-Mail to address@hidden
E-Mail: address@hidden Web: http://www.scode.org
signature.asc
Description: This is a digitally signed message part.