[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-tar] Sparse file performance and suggestions
From: |
Eric Blake |
Subject: |
Re: [Bug-tar] Sparse file performance and suggestions |
Date: |
Mon, 07 Feb 2011 09:09:32 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7 |
[adding coreutils]
On 02/06/2011 08:49 AM, Joerg Schilling wrote:
>> The archives created by star and GNU tar were of identical size, and both
>> programs could extract each correctly. Extraction times were similar (both
>> under 4 seconds). However, GNU tar took about 2.8 times as long as star to
>> create the archive.
>
> I asume that you are testing on a OS that does not implement
> SEEK_HOLE/SEEK_DATA, as star is even faster that GNU tar in case that the OS
> helps to retrieve the sparse file info.
Solaris has SEEK_HOLE, although the new coreutils cp code for using
efficient sparse traversal has not yet been ported to use it. It's on
the list of things to implement (and has been for more than 6 months).
Linux has ioctl(FIEMAP), which is just as efficient as Solaris'
SEEK_HOLE, and coreutils 8.10 is the first GNU program to use it. We
are planning on moving coreutils' sparse traversal into gnulib (where it
can be used by tar), as well as enhancing it to recognize Solaris'
SEEK_HOLE, at which point, GNU tar should indeed be faster at
recognizing already sparse files, and at which point you will indeed
want a new tuning option that controls whether tar faithfully copies the
existing holes of the source files (faster, but overlooks non-sparse
blocks of 0s that could have been made sparse), vs. finding all runs of
0s (slower, done by current behavior, can result in a sparser copy than
the original, and can still be made somewhat faster by skipping known
holes).
> For further thoughts on the archive format, it makes sense to rething how GNU
> tar currently archies sparse files. The format currently used by GNU tar takes
> much more program data space than need and it is limited in the max. amount
> of
> holes that can be acrhived.
>
> As the hole data os no longer block aligned, it cannot be read blockwise and
> needs a big malloc(). As the sparse meta data is stored inside the POSIX meta
> data area, it is limited to 8 GB. This limits the number og hole/data pairs
> and
> a file with maximum holyness may not be larger than a gew TB.
These points are also true, but are independent of whether coreutils'
sparse traversal code is moved over to gnulib and tar taught to take
advantage of sparse traversal.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature