[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-tar] Sparse file performance and suggestions
From: |
Joerg Schilling |
Subject: |
Re: [Bug-tar] Sparse file performance and suggestions |
Date: |
Sun, 06 Feb 2011 16:49:45 +0100 |
User-agent: |
nail 11.22 3/20/05 |
address@hidden wrote:
> Currently, tar seems to perform quite sub-optimally when archiving sparse
> files. I compared the performance of GNU tar and star when archiving a
> large (~2TB) sparse file. All but about 180MB of the file was holes.
>
> The archives created by star and GNU tar were of identical size, and both
> programs could extract each correctly. Extraction times were similar (both
> under 4 seconds). However, GNU tar took about 2.8 times as long as star to
> create the archive.
I asume that you are testing on a OS that does not implement
SEEK_HOLE/SEEK_DATA, as star is even faster that GNU tar in case that the OS
helps to retrieve the sparse file info.
> In future, the tar file format could be updated to allow sparse files to
> be archived in a single pass, but it would require that the archive be
> seekable. Or alternatively, tar would need a buffer at least as big as the
> largest non-hole region. (Extraction wouldn't need a seekable archive.)
The tar archive format is bases on the assumption that it is written to
non-seekable media. You need to know the archive size for a file in order to
write the tar header and you cannot know that before you did scan the holes.
For further thoughts on the archive format, it makes sense to rething how GNU
tar currently archies sparse files. The format currently used by GNU tar takes
much more program data space than need and it is limited in the max. amount of
holes that can be acrhived.
As the hole data os no longer block aligned, it cannot be read blockwise and
needs a big malloc(). As the sparse meta data is stored inside the POSIX meta
data area, it is limited to 8 GB. This limits the number og hole/data pairs and
a file with maximum holyness may not be larger than a gew TB.
Jörg
--
EMail:address@hidden (home) Jörg Schilling D-13353 Berlin
address@hidden (uni)
address@hidden (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily