rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Tar replacement - format proposal


From: Ben Escoto
Subject: Re: [rdiff-backup-users] Tar replacement - format proposal
Date: Fri, 26 Sep 2003 18:39:43 -0700

On Fri, 26 Sep 2003 09:21:02 -0500
John Goerzen <address@hidden> wrote:
> One problem I see is XML.  Yes, it is versatile, but it is also
> overkill for this and it is complex.  An archive should be as simple
> as possible, and it should be able to be restored with as few tools
> as possible.  XML is not simple, and using it will generally require
> a libxml on the system.  This can right away put your format out of
> the running for things like installers, critical system backups, and
> anything else that is extremely space-conscious for program
> footprint and required shared libraries.
> 
> Moreover, you don't need XML for what you are trying to achieve.
> All you need is something "more versatile than tar".  It shouldn't
> be hard to arrive at a key/value system.  For instance, for files,
> you could have:
> 
>   NAME\0/foo/bar/baz\0
>   MTIME\0514341312\0
>   CTIME\03413413214\0
> 
> Of course, you could write the MTIME and CTIME in binary, and you
> could abbreviate those names to "N", "M", and "C" to save time.
> What's more, this format is quite extensible, almost to the same
> degree as XML, and you save space in the archive and space in memory
> and library requirements, not to mention ease processing.

KS> Another important consideration is that the necessary tools to
KS> extract an archive should be as compact and self reliant as
KS> possible as they are the kind of tools you need on rescue floppies
KS> and the like.

Firstly, duplicity and rdiff-backup are written in python, which
already takes up a ton of space.  I don't even see the problem with
this.  The last rescue disk I used was Knoppix, which even has
OpenOffice and Mozilla on it.  The Redhat installer is written in
python.  It may be a bit anacronistic to design for a 1.44MB rescue
floppy instead of a 600+MB one.  Besides, is XML really that heavy?
At least simple XML can be generated and parsed very easily.

Secondly, and more importantly I think, not all the metadata may fit
into that field-data format.  rdiff-backup stores metadata, and I
thought about using XML, rejected it as being unnecessary, and then
used a very simple system like the one you describe.  However, what do
we do with ACLs?  A file can have two ACLs, and each ACL can have a
number of ACL entries.  Each ACL entry can have a type, a user/group
id, and a permission set.  Once the data becomes hierarchical, it may
be easier just to use XML than inventing various sub-encodings for
each bit of data.

Assuming all the metadata is placed together in the index, XML will
typically compress very well.  For instance, on my machine gzip
compresses the metadata of my rdiff-backup directory from 77MB to 6MB
(about a factor of 13).


-- 
Ben Escoto

Attachment: pgpkrwytfiVCk.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]