bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Interchange/performance issue with archive containing spar


From: Tim Kientzle
Subject: Re: [Bug-tar] Interchange/performance issue with archive containing sparse file
Date: Sun, 6 Feb 2011 10:03:40 -0800

On Feb 6, 2011, at 6:35 AM, Joerg Schilling wrote:
> address@hidden wrote:
> 
>> There seems to be an interchange problem between star and GNU tar,
>> possibly related to sparse file support. Or perhaps this is just a bug in
>> GNU tar?
> 
> If GNU tar archives sparse files, it creates archives that violate the 
> POSIX structuring conventions for TAR archives. 

The newer GNU tar --posix support addresses this, though
it's not (yet?) the default format for GNU tar.  I think the
current "1.0" variant is pretty well thought out (though I do have
a couple of small quibbles. ;-)

Libarchive now supports the GNU tar --posix "1.0" variant when
writing sparse files.

> address@hidden wrote:
> 
> Currently, tar seems to perform quite sub-optimally when archiving sparse
> files. I compared the performance of GNU tar and star when archiving a
> large (~2TB) sparse file. All but about 180MB of the file was holes.

You didn't specify what operating system you were using.
Some operating systems do have support for locating holes,
but it varies a lot.  Joerg has done a lot of performance work
on star, so it's entirely possible that he's optimized hole support
for your OS and the GNU tar folks haven't (yet) done so.

> address@hidden wrote:
> 

> In future, the tar file format could be updated to allow sparse files to
> be archived in a single pass, but it would require ...

I've considered approaches like this for libarchive, but
I haven't found the time to experiment with them.

Specifically, this could be done without seeking
(and without completely ignoring the standards)
by recording a complete tar entry for each "packet" of a file
(where a packet encodes a hole and a block of data
following the hole).   GNU tar does something conceptually
similar for it's multi-volume support: it writes a single file as
multiple entries in the archive.  Using pax extensions,
the overhead for this approach would be 1536+ bytes
for each "packet", or about 0.015% with 10MB packets.

Cheers,

Tim




reply via email to

[Prev in Thread] Current Thread [Next in Thread]