[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] atomic increment files?
From: |
Marcel (Felix) Giannelia |
Subject: |
Re: [rdiff-backup-users] atomic increment files? |
Date: |
Mon, 09 Mar 2009 18:21:26 -0700 |
User-agent: |
Thunderbird 2.0.0.6 (X11/20071015) |
Hi Matt,
Matthew Flaschen wrote:
It seems to me you're very wrong. A typical restore is only going to
touch a few files.
If a typical restore is only restoring a small part of the filesystem
and only going back a few days, you're right. But I wasn't even
concerned with restore operations -- I want the increment storage to be
more efficient so that I can archive it quickly and easily.
Now you tried to address this with, "accessing an
archive file's directory structure is likely faster than doing the same
in a part of the filesystem containing many thousands of files per
directory." But you provided no evidence whatsoever for this very
non-obvious statement.
It doesn't strike me as non-obvious; reading an archive header on a file
format that stores one (e.g. rar, zip, 7z as opposed to tar, which
doesn't) has always seemed to go faster than enumerating the same list
of files in the filesystem, when there are many files involved. That's
because reading an archive header is a single, linear disk read, whereas
a large subtree traversal involves a crapload of very small reads that
are all over the place. Seeking time is where most of the waiting is
done -- hard drives can access data that happens to be under the heads
in about 1ms (or less?), but seeking the heads takes on the order of
13-15 ms.
If I need to get a directory listing of a very large subtree (for
instance, the one in my problem backup) I usually do something like this:
find . -type f -print0 | xargs -0 ls -l >> big_directory_listing
(I use find instead of ls -R because find prints the full path on every
line, which is easier to parse in scripts.)
That way, I can quickly refer to big_directory_listing without having to
traverse the subtree again. In my case, a total of about 1.5 million
files produced an output file of over 200 MB, but it was still *much*
faster to work with that text file than it was to read from the
filesystem. For instance, du -s took over an hour even the second time I
ran it [because the size of the directory tree was too big to fit into
the RAM file cache], but "cat big_directory_listing | gawk '{sum += $5}
END {print sum}'" took about 15 seconds.
If that were true, why don't people use AVFS as
their primary filesystem?
My points about access speed only apply to data sets that are read-only.
Updating an index like that is a relatively slow operation, and wouldn't
work very well for day-to-day filesystem use. Filesystems are designed
to make a trade-off between enumeration speed and update speed, and for
the most part they do that fairly well (and they're getting better).
But when you're storing something that you *know* is going to be
read-only and will never need to be modified again, then it makes more
sense to store a nice index at the front (which is why CD's do that).
Aside from unrecommended fiddling, people generally don't modify
rdiff-backup's increment files so I think that would be a good
application of a nice indexed archive. Creating the index would incur
almost no extra time, increments could be stored without the extra slack
space separate files cause, and typical restore operations wouldn't slow
down by much (full restores from a long time ago would probably take
longer, but they already take a while).
~Felix.
- [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/07
- Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?, Matthew Flaschen, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?,
Marcel (Felix) Giannelia <=
- Re: [rdiff-backup-users] atomic increment files?, Andrew Ferguson, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?, Andrew Ferguson, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?, Andrew Ferguson, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
- Re: [rdiff-backup-users] atomic increment files?, Andrew Ferguson, 2009/03/10
- Re: [rdiff-backup-users] atomic increment files?, Maarten Bezemer, 2009/03/10
- Re: [rdiff-backup-users] atomic increment files?, Matthew Flaschen, 2009/03/09