rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: re[2]: [rdiff-backup-users] ACLs/EAs (who is the audience?)


From: Ben Escoto
Subject: Re: re[2]: [rdiff-backup-users] ACLs/EAs (who is the audience?)
Date: Sat, 08 Mar 2003 00:26:36 -0800

>>>>> "GF" == Greg Freemyer <address@hidden>
>>>>> wrote the following on Fri, 7 Mar 2003 17:44:26 -0500

  GF> Samba treats ACLs as a ./configure parameter, not run time.
  GF> Maybe that would be better for rdiff-backup as well.

Well right now rdiff-backup does not have a ./configure script.  The
compile-time stuff doesn't really feel like the "pythonic" way of
doing things.  Maybe rdiff-backup should try to import some python
acl/ea library (see for instance

http://pylibacl.sourceforge.net/
http://pyxattr.sourceforge.net/

) and enable acls/eas if it finds them.  Normally an import failure
would cause rdiff-backup to run without acl/ea support, but maybe the
--record-acls would cause it to abort and display an error message if
it couldn't find the library.

  GF> Also, I suspect you will need to just have --record-acls and
  GF> --record-eas.  If someone wants both, they need to ask for both.

  GF> i.e.  My primary use for this feature is to backup samba
  GF> exported Linux FSs.

  GF> For that situation, I need --record-eas xor --record-acls.

  GF> Either would work, but doing both would capture the ACLs twice.

Well, the way I understand what I was suggesting, it wouldn't be
possible to record eas and acls separately.  So if you had EAs
enabled, then you would have to have ACLs enabled, because ACLs would
just be a particular kind of EA.

  GF> Linux EAs are simple name value pairs, where the value can be
  GF> binary.

  GF> Does that make sense with your current format?

I think so.  Right now each line is a different record, so newlines
must be quoted.  Perhaps this isn't the absolutely ideal format, but I
think it is simple and "good enough".

  GF> I can see that growing to hundreds of megs or more.  Wouldn't it
  GF> be slow to have to update that large a file due to a small EA
  GF> change.

  GF> You will definitely be getting lots of small changes.

It may not be as slow as it appears, and the space savings could be
worth it.  For instance the mirror_metadata file has many small
changes, but it seems appropriate to store it in one large file
instead of many small ones.  My nightly backup event runs locally,
takes about 15 minutes, and backs up about 600000 files.  It writes a
mirror_metadata file of about 8MB, or about 90MB uncompressed.  Thus
each file has about 150 bytes of metadata.  If this data were stored
in 600000 separate files, a normal (not tail packing like ReiserFS)
file system would have to spend maybe 16KB (minimum file size) *
600000 = 9.6GB!  By storing as one large file we reduce the size
required by a factor of 1000.

Also, my machine can compress the 90MB file in about 4 seconds and
decompress it maybe half that.  So scanning through a, say, 300MB file
would still only take a relatively small amount of time on many
systems.

  GF> For instance, I use xfsdump to backup my samba shares currently.
  GF> xfsdump accepts a dump-level as an arg.

  GF> For every file it backs up, it modifies a dump-level EA.  So if
  GF> I were to do: rdiff-backup --record-eas ...  xfsdump ...
  GF> rdiff-backup --record-eas ...

  GF> You could have thousands (tens of thousands, hundreds of
  GF> thousands) of EA changes, but no data changes.

Well, if the eas for each file don't take up much space, using rdiff
won't be very effective because its blocksize (usually 2KB) will cause
all of the file's EAs to be updated anyway.  And the speed question is
also up in the air because writing that data to a single compressed
file (even though we have to re-write some non-changed information)
could be faster that opening and closing lots of small files.

I think we may need some numbers to make an informed decision on the
single file vs tree issue.  If the average size of a file's EAs is 300
bytes, the single file seems better (but maybe I'm overlooking
something).  If each file has 40K of EAs, then the tree would win I
think.  Can you (or someone else on the list) estimate how many bytes
of extended attributes a file will have?

Actually, I just had an idea:  If we use the one file way, maybe
rdiff-backup could store extended attribute information in a way
compatible with the getfattr/setfattr utilities.  I've never used
them, but I notice that getfattr has a --recursive switch...


-- 
Ben Escoto

Attachment: pgp1cxTwnUo6x.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]