[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lzip-bug] Tarball indexing and plzip
From: |
Antonio Diaz Diaz |
Subject: |
Re: [Lzip-bug] Tarball indexing and plzip |
Date: |
Sun, 10 Mar 2019 16:53:54 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 |
Hello Dennis,
Dennis Katsonis wrote:
I was wondering whether it would be difficult or not, to add
functionality to plzip, or create a variant of it, which had tarball
indexing capabilities like pixz.
I am in the process of implementing something like that, and more, but
in tarlz, not in plzip: http://www.nongnu.org/lzip/tarlz.html
Pixz allows a more random access to the compressed tarball. Listing is
very quick, and even extracting a file at the end of a large tarball is
quite fast, not too much slower than extracting it from an uncompressed,
indexed tarball. A major advantage when extracting select files from an
archived compressed tarball.
Tarlz is not complete yet, but it can already list pretty quick if the
archive is created with the right options[1]. Parallel extraction should
be similarly quick once it is implemented.
http://www.nongnu.org/lzip/manual/tarlz_manual.html#Multi_002dthreaded-tar
If the files in the archive are large, multi-threaded '--list' on a
regular (seekable) tar.lz archive can be hundreds of times faster than
sequential '--list' because, in addition to using several processors, it
only needs to decompress part of each lzip member. See the following
example listing the Silesia corpus on a dual core machine:
tarlz -9 --no-solid -cf silesia.tar.lz silesia
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)
I expect that tarlz, or something based on the same principles, will
obsolete conventionally compressed tar archives.
Best regards,
Antonio.