[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-tar] Interchange/performance issue with archive containing spar
From: |
Joerg Schilling |
Subject: |
Re: [Bug-tar] Interchange/performance issue with archive containing sparse file |
Date: |
Sun, 06 Feb 2011 22:56:23 +0100 |
User-agent: |
nail 11.22 3/20/05 |
Tim Kientzle <address@hidden> wrote:
> >
> > If GNU tar archives sparse files, it creates archives that violate the
> > POSIX structuring conventions for TAR archives.
>
> The newer GNU tar --posix support addresses this, though
> it's not (yet?) the default format for GNU tar. I think the
> current "1.0" variant is pretty well thought out (though I do have
> a couple of small quibbles. ;-)
>
> Libarchive now supports the GNU tar --posix "1.0" variant when
> writing sparse files.
I am not sure what you understand by posix version 1.0. The first GNU tar
implementation that did move the hole description data into the POSIX extended
headers created no problems because huge amounts of xheader data need to be
allocated for parsing, but it was in conflict with the POSIX rules for xheaders
as it _repeated_ line pairs like:
16 GNU.xxx.hole=123456
17 GNU.xxx.data=1234567
but the POSIX standard says that in case of releated entries, the last one is
valid.
I asume that the current variant thus cannot be called "1.0". It is different
and IIRC, it contains has a very long line of hole/data pairs. This is neither
easy to read (star would need to malloc space for the maximum size of the
xheader as the data is not block oriented), nor does it allow to archive larger
sparse files. Note that the max. size of an xheader is 8 GB. Note that this
would still not allow to have a 32 bit tar program to hadle the max. size, as a
32 bit process cannot grow to even 4 GB.
A file with maximum sparseness thus currently only can grow up to aprox. 3 TB
until it is no longer archivable by GNU tar even with a 64 bit binary.
If I compare the currently available methods for handling the sparse data, the
currently used method from star still seems to be the best.
- The data is block oriented and thus can be read on the fly without a
need to malloc() sizeof ascii parse data
- The base 256 format I introduced in the mid 1990s is smaller than
archiving the numbers as decimal strings.
- The base 256 format still allows 95 bits for the file size wich is
sufficient for any local stgorage in a single universe, as this would
take aprox. 1 MegaMol for active storage (net) mass if one bit takes
one atom.
- It is located in the file data space and thus unlimited in size.
I am not sure whether the current GNU tar sparse format will last for a longer
time and this is why I am not sure whether I should implement support for it.
> > In future, the tar file format could be updated to allow sparse files to
> > be archived in a single pass, but it would require ...
>
> I've considered approaches like this for libarchive, but
> I haven't found the time to experiment with them.
>
> Specifically, this could be done without seeking
> (and without completely ignoring the standards)
> by recording a complete tar entry for each "packet" of a file
If the file is archived in chunks, it could be done without seeks.
BTW: Star implements a fifo since more than 20 years and because of this
fifo cannot be easily upgraded to support seeking.
Jörg
--
EMail:address@hidden (home) Jörg Schilling D-13353 Berlin
address@hidden (uni)
address@hidden (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily