duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Duplicity-talk] "New file format" — a ple a for an RS code (error corre


From: Gregory Maxwell
Subject: [Duplicity-talk] "New file format" — a ple a for an RS code (error correction)
Date: Sat, 31 Jan 2009 00:12:53 -0500

Almost everyone with more than a few years of backup experience has
had at least one terrible encounter with a single bad block during
recovery. No matter how well you verify your backups (and many people
do not at all) bits rot on media and it always seems that the single
annoying bad sector will always find its way to the middle of your
backup files… and you find yourself trying to make a good restore by
piecing together multiple backups manually.  This problem is
especially challenging if you make extensive use of incremental
backups.

Increasingly dense backup mediums have done nothing but increase the
incidence of sector block lossage.

I saw the "New file format" discussion on the webpage and wanted to
implore those working on the format to at least consider supporting
block erasure error recovery.

RS codes are a family of optimal block erasure codes. For N blocks of
data input they produces M blocks of output (where M>N) and you can
losslessly recover the data so long as you have any N of the M. (You
choose the N and M to meet your redundancy requirements).  This is the
same scheme used by raid-5 (M is N+1) and raid-6 (M is N+2).

The Reed-Solomon (RS) code requires that you know a block is bad.
Sometimes the underlying storage will handle this for you, but not
always and it shouldn't always be trusted. So its better to also
include a per-block checksum. This adds a bit more overhead, but it's
negligible if the blocksize is large (and you should probably be doing
it anyways to avoid a bad block totally desyncronizing file decode)

(There are other erasure codes which could be used, such as block
LDPC. But RS is optimal (you always need only N of M). The advantage
of block LDPC is that RS is slow on general purpose computers when N
and M become large (i.e. any M over 255 requires wide modular
arithmetic libraries)).


This kind of redundancy could be accomplished externally to duplicity
(by creating external check/recover files) but such schemes don't
typically work well with incrementally updated files, don't integrate
well with recovery, and won't be discovered by users until *after*
they needed it.

Duplicity's file format could support error recovery by having regular
fixed sized blocks, a built in value for N, and a user specifiable
amount of redundancy. So for a modest increase in file size users
could enjoy a much more reliable backup.  This increase would be far
less than the gain achieved by switching to a more efficient
compression method. (i.e. LZMA over zlib)


It is best if the blocks match the underlying 'failure unit' (i.e. 512
byte disk sectors on disk based storage) but since duplicity often
won't know the failure unit, and small blocksizes imply greater
overhead for checksums, some fixed large size (say something in the
range of 4-64kb) will probably make sense.

zfec (http://allmydata.org/trac/zfec) is a GPLed RS library that
duplicity could potentially use.

Thanks for your consideration.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]