bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Interchange/performance issue with archive containing spar


From: Joerg Schilling
Subject: Re: [Bug-tar] Interchange/performance issue with archive containing sparse file
Date: Tue, 08 Feb 2011 00:18:44 +0100
User-agent: nail 11.22 3/20/05

Tim Kientzle <address@hidden> wrote:

> >> Libarchive now supports the GNU tar --posix "1.0" variant when
> >> writing sparse files.
> > 
> > I am not sure what you understand by posix version 1.0. The first GNU tar 
> > implementation that did move the hole description data into the POSIX 
> > extended 
> > headers ...
>
> This is the format documented by GNU tar as their "0.0" sparse variant.
>
> > I asume that the current variant thus cannot be called "1.0". It is 
> > different 
> > and IIRC, it contains has a very long line of hole/data pairs.
>
> This is the format documented by GNU tar as their "0.1" sparse variant.
>
> The current format (which GNU tar documents as the "1.0" sparse format)
> stores the hole/data information in a block prepended to the file data.
> In particular, this makes it possible to extract a sparse file
> to disk with a tar program that does not understand the
> extension, then post-process the files on disk.

Good to hear, so it seems that the limitations have been removed again.

> (My one real quibble:  When extracted by tar programs that
> don't understand the extension, the resulting files on disk
> lack any file signature to indicate the wrapper format.)

Star did allow to distinguish all tar known formats since a really long time
and star introduced only new formats that include fingerprints that allow to 
recognize the format type. The first star specific format has been introduced
in 1985 and it allowed to archive any file type, all UNIX time stamps and a
star specific "tar" signature at the end of the header.

GNU tar started as PD tar in December 1986 and reached the first bigger 
audience at the Sun User Group meeting in December 1987 in the Fairmount Hilton
in San Jose as SUG-tar. It apparently used the first POSIX.1-1988 draft (I did
not have access to that draft). GNU tar introduced "ustar  \0" as a magic while 
POSIX.1-1988 introduced "ustar\000". This was a lucky coincidence as it 
permitted to easily distinct the non-POSIX GNU tar archive format from POSIX.

Around y2000, someone changed GNU tar to use the POSIX magic while it continued 
to be non-POSIX. This made it hard for star to correctly unpack GNU tar 
archives.

Then we started with POSIX.1-2001 extended headers and Glenn Fowler proposed to 
use a "global" extended header to mark the archive format. Star adopted this 
idea and star extensions introduced past that time are marked via a

SCHILY.archtype=string

tag. I thought that it would be sufficient to know the archive type ad people 
would try to unpack the not unpacked sparse files directly after the extract 
operation has been done.

If you like to like to introduce a format marker in the sparse format, I would 
rather propose to standardize on the sparse file coding.

When I introduced support in star in 1994, I thouth about two possible coding 
methods:

1)      The one used by GNU tar. This is better for larger hole/data chunks

2)      a bitmap format. This is better for files with a high hole density.

With the idea recently introduced, it may be possible to combine both variants 
depending on the appearance of the current area. Maybe, we should try to 
standardize such a format?


> It has been a while since I looked at the star approach, but
> it sounds very similar to the current GNU tar approach.

Star just kept the old basic idea from GNU tar. 

Jörg

-- 
 EMail:address@hidden (home) Jörg Schilling D-13353 Berlin
       address@hidden                (uni)  
       address@hidden (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily



reply via email to

[Prev in Thread] Current Thread [Next in Thread]