Re: [rdiff-backup-users] atomic increment files?

rdiff-backup-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] atomic increment files?

From:	Marcel (Felix) Giannelia
Subject:	Re: [rdiff-backup-users] atomic increment files?
Date:	Mon, 09 Mar 2009 18:21:26 -0700
User-agent:	Thunderbird 2.0.0.6 (X11/20071015)

Hi Matt,

Matthew Flaschen wrote:


It seems to me you're very wrong.  A typical restore is only going to
touch a few files.

If a typical restore is only restoring a small part of the filesystemand only going back a few days, you're right. But I wasn't evenconcerned with restore operations -- I want the increment storage to bemore efficient so that I can archive it quickly and easily.

Now you tried to address this with, "accessing an
archive file's directory structure is likely faster than doing the same
in a part of the filesystem containing many thousands of files per
directory."   But you provided no evidence whatsoever for this very
non-obvious statement.

It doesn't strike me as non-obvious; reading an archive header on a fileformat that stores one (e.g. rar, zip, 7z as opposed to tar, whichdoesn't) has always seemed to go faster than enumerating the same listof files in the filesystem, when there are many files involved. That'sbecause reading an archive header is a single, linear disk read, whereasa large subtree traversal involves a crapload of very small reads thatare all over the place. Seeking time is where most of the waiting isdone -- hard drives can access data that happens to be under the headsin about 1ms (or less?), but seeking the heads takes on the order of13-15 ms.

If I need to get a directory listing of a very large subtree (forinstance, the one in my problem backup) I usually do something like this:


find . -type f -print0 | xargs -0 ls -l >> big_directory_listing

(I use find instead of ls -R because find prints the full path on everyline, which is easier to parse in scripts.)

That way, I can quickly refer to big_directory_listing without having totraverse the subtree again. In my case, a total of about 1.5 millionfiles produced an output file of over 200 MB, but it was still *much*faster to work with that text file than it was to read from thefilesystem. For instance, du -s took over an hour even the second time Iran it [because the size of the directory tree was too big to fit intothe RAM file cache], but "cat big_directory_listing | gawk '{sum += $5}END {print sum}'" took about 15 seconds.

  If that were true, why don't people use AVFS as
their primary filesystem?

My points about access speed only apply to data sets that are read-only.Updating an index like that is a relatively slow operation, and wouldn'twork very well for day-to-day filesystem use. Filesystems are designedto make a trade-off between enumeration speed and update speed, and forthe most part they do that fairly well (and they're getting better).

But when you're storing something that you *know* is going to beread-only and will never need to be modified again, then it makes moresense to store a nice index at the front (which is why CD's do that).Aside from unrecommended fiddling, people generally don't modifyrdiff-backup's increment files so I think that would be a goodapplication of a nice indexed archive. Creating the index would incuralmost no extra time, increments could be stored without the extra slackspace separate files cause, and typical restore operations wouldn't slowdown by much (full restores from a long time ago would probably takelonger, but they already take a while).


~Felix.

[Prev in Thread]

Current Thread

[Next in Thread]

[rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/07
- Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
  - Re: [rdiff-backup-users] atomic increment files?, Matthew Flaschen, 2009/03/09
    - Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia <=
    - Re: [rdiff-backup-users] atomic increment files?, Andrew Ferguson, 2009/03/09
    - Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
    - Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
    - Re: [rdiff-backup-users] atomic increment files?, Andrew Ferguson, 2009/03/09
    - Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
    - Re: [rdiff-backup-users] atomic increment files?, Andrew Ferguson, 2009/03/09
    - Re: [rdiff-backup-users] atomic increment files?, Marcel (Felix) Giannelia, 2009/03/09
    - Re: [rdiff-backup-users] atomic increment files?, Andrew Ferguson, 2009/03/10
    - Re: [rdiff-backup-users] atomic increment files?, Maarten Bezemer, 2009/03/10
    - Re: [rdiff-backup-users] atomic increment files?, Matthew Flaschen, 2009/03/09

Prev by Date: Re: [rdiff-backup-users] atomic increment files?
Next by Date: Re: [rdiff-backup-users] atomic increment files?
Previous by thread: Re: [rdiff-backup-users] atomic increment files?
Next by thread: Re: [rdiff-backup-users] atomic increment files?
Index(es):
- Date
- Thread