I have lots of large .tar.gz files and I need to extract just a single (small) file from it. I purposely put it at the front of .tar file so that extraction is fast, but if that file is gzipped, then 'tar' wants to read the whole .tgz file before exiting.
to explain this phenomena, consider this
1) regular extract
desktop1:/tmp$ time dd if=linux-3.4.2.tar.bz2 bs=1k|bunzip2|tar x linux-3.4.2/Documentation/ABI/README
78284+1 records in
78284+1 records out
80162970 bytes (80 MB) copied, 8.96967 s, 8.9 MB/s
real 0m8.983s
user 0m9.057s
sys 0m0.549s
* performance is same for "tar jxf linux-3.4.2.tar.bz2 linux-3.4.2/Documentation/ABI/README"
2) crude way of fast extract
address@hidden:/tmp$ time dd if=linux-3.4.2.tar.bz2 bs=1k count=1000|bunzip2|tar x linux-3.4.2/Documentation/ABI/README
1000+0 records in
1000+0 records out
1024000 bytes (1.0 MB) copied, 0.0980247 s, 10.4 MB/s
bunzip2: Compressed file ends unexpectedly;
perhaps it is corrupted? *Possible* reason follows.
bunzip2: Inappropriate ioctl for device
Input file = (stdin), output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
/tmp/tar: Unexpected EOF in archive
/tmp/tar: Error is not recoverable: exiting now
real 0m0.105s
user 0m0.104s
sys 0m0.009s
As you can see, using method (2) I can still extract single file in 0.1 second (vs 8.9 second). Looks to me that 'tar' still reads the whole archive from stdin even though it is done extracting.
Q: is there any special option to make it fast. If not this would be really good enhancement (I saw lot of people asking for it on the web). If someone can post a patch to fix this behaviour, that would be really nice. I spent sometime reading source code for tar, but things aren't looking obvious to me.
Thanks,
Dilip