bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] add option to set the offset of the archive


From: Matteo Croce
Subject: Re: [PATCH 1/2] add option to set the offset of the archive
Date: Fri, 25 Oct 2024 19:22:04 +0200

Il giorno ven 25 ott 2024 alle ore 07:51 Sergey Poznyakoff
<gray@gnu.org.ua> ha scritto:
>
> Hi Matteo,
>
> > I'm developing a feature which aligns the file content to the
> > filesystem boundary by adding a PAX comment header.
> > This, along with, the --offset option will allow to extract a Debian
> > package (which is an an archive embedding a tar one),
> > by using under Linux the FICLONERANGE ioctl(), and extract the archive
> > without doing any IO.
>
> The idea is interesting.  However, I believe that "without doing any IO"
> is an exaggeration.  Indeed, FICLONERANGE allows you to make contents
> of one file descriptor appear under another file descriptor and that
> doesn't require additional I/O.  But *reading* from that other descriptor
> would mean I/O activity, as usual.  So, it seems that this will have the
> same effect as running
>
>   ar p DEBFILE | tar tfJ -
>
> (modulo compression option).  Am I missing something?
>

Hi Sergey,

As you're interested, I'm providing more details.
During the download phase, the one done by apt, I pipe the archives
into a tool named `debcow`.
This tool adds a member before the data.tar.xz to align the data
starting offset, uncompresses it and adds comments in the PAX header
to align the file data:

$ ar -tO coreutils_9.1-1_amd64.deb
debian-binary 0x44
control.tar.xz 0x84
data.tar.xz 0x1c3c
$ debcow <coreutils_9.1-1_amd64.deb >coreutils_9.1-1_amd64_cow.deb
$ ar -tO coreutils_9.1-1_amd64_cow.deb
debian-binary 0x44
control.tar.xz 0x84
_data-pad 0x1c3c
data.tar 0x2000

Once I have the data.tar and the files aligned on the block boundary,
I extract the files with FICLONERANGE instead of the usual
read()/write() loop:

$ { dd status=none bs=8k skip=1 count=0 && strace -eopenat,ioctl tar
-x -C root/ --reflink ; } <coreutils_9.1-1_amd64_cow.deb
openat(3, "./bin/cat",
O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
src_offset=4096, src_length=45056, dest_offset=0}) = 0
openat(3, "./bin/chgrp",
O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
src_offset=53248, src_length=69632, dest_offset=0}) = 0
openat(3, "./bin/chmod",
O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
src_offset=126976, src_length=65536, dest_offset=0}) = 0
openat(3, "./bin/chown",
O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
src_offset=196608, src_length=73728, dest_offset=0}) = 0

yes, some I/O is still needed, but only to read the file entries from
the tar archive, the file content can be skipped.
So when extracting bigger files, the improvement can be substantial.

Regards,
-- 
Matteo Croce

perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay



reply via email to

[Prev in Thread] Current Thread [Next in Thread]