[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] add option to set the offset of the archive
From: |
Matteo Croce |
Subject: |
Re: [PATCH 1/2] add option to set the offset of the archive |
Date: |
Fri, 25 Oct 2024 21:17:58 +0200 |
Il giorno ven 25 ott 2024 alle ore 19:22 Matteo Croce
<technoboy85@gmail.com> ha scritto:
>
> Il giorno ven 25 ott 2024 alle ore 07:51 Sergey Poznyakoff
> <gray@gnu.org.ua> ha scritto:
> >
> > Hi Matteo,
> >
> > > I'm developing a feature which aligns the file content to the
> > > filesystem boundary by adding a PAX comment header.
> > > This, along with, the --offset option will allow to extract a Debian
> > > package (which is an an archive embedding a tar one),
> > > by using under Linux the FICLONERANGE ioctl(), and extract the archive
> > > without doing any IO.
> >
> > The idea is interesting. However, I believe that "without doing any IO"
> > is an exaggeration. Indeed, FICLONERANGE allows you to make contents
> > of one file descriptor appear under another file descriptor and that
> > doesn't require additional I/O. But *reading* from that other descriptor
> > would mean I/O activity, as usual. So, it seems that this will have the
> > same effect as running
> >
> > ar p DEBFILE | tar tfJ -
> >
> > (modulo compression option). Am I missing something?
> >
>
> Hi Sergey,
>
> As you're interested, I'm providing more details.
> During the download phase, the one done by apt, I pipe the archives
> into a tool named `debcow`.
> This tool adds a member before the data.tar.xz to align the data
> starting offset, uncompresses it and adds comments in the PAX header
> to align the file data:
>
> $ ar -tO coreutils_9.1-1_amd64.deb
> debian-binary 0x44
> control.tar.xz 0x84
> data.tar.xz 0x1c3c
> $ debcow <coreutils_9.1-1_amd64.deb >coreutils_9.1-1_amd64_cow.deb
> $ ar -tO coreutils_9.1-1_amd64_cow.deb
> debian-binary 0x44
> control.tar.xz 0x84
> _data-pad 0x1c3c
> data.tar 0x2000
>
> Once I have the data.tar and the files aligned on the block boundary,
> I extract the files with FICLONERANGE instead of the usual
> read()/write() loop:
>
> $ { dd status=none bs=8k skip=1 count=0 && strace -eopenat,ioctl tar
> -x -C root/ --reflink ; } <coreutils_9.1-1_amd64_cow.deb
> openat(3, "./bin/cat",
> O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
> ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
> src_offset=4096, src_length=45056, dest_offset=0}) = 0
> openat(3, "./bin/chgrp",
> O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
> ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
> src_offset=53248, src_length=69632, dest_offset=0}) = 0
> openat(3, "./bin/chmod",
> O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
> ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
> src_offset=126976, src_length=65536, dest_offset=0}) = 0
> openat(3, "./bin/chown",
> O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
> ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
> src_offset=196608, src_length=73728, dest_offset=0}) = 0
>
> yes, some I/O is still needed, but only to read the file entries from
> the tar archive, the file content can be skipped.
> So when extracting bigger files, the improvement can be substantial.
>
> Regards,
> --
> Matteo Croce
>
> perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay
This is a test done on a linux source tree:
$ time tar xf linux-6.11.5.tar
real 0m13.375s
user 0m0.479s
sys 0m3.094s
$ time tar xf linux-6.11.5.tar --reflink
real 0m3.255s
user 0m0.414s
sys 0m1.722s
The same command analyzed with perf:
$ perf stat tar xf linux-6.11.5.tar
Performance counter stats for 'tar xf linux-6.11.5.tar':
4,553.71 msec task-clock:u # 0.334
CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
168 page-faults:u # 36.893 /sec
<not supported> cycles:u
<not supported> instructions:u
<not supported> branches:u
<not supported> branch-misses:u
13.614911389 seconds time elapsed
0.473461000 seconds user
3.305497000 seconds sys
$ perf stat tar xf linux-6.11.5.tar --reflink
Performance counter stats for 'tar xf linux-6.11.5.tar --reflink':
2,298.44 msec task-clock:u # 0.663
CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
167 page-faults:u # 72.658 /sec
<not supported> cycles:u
<not supported> instructions:u
<not supported> branches:u
<not supported> branch-misses:u
3.464235818 seconds time elapsed
0.413364000 seconds user
1.747194000 seconds sys
I'm tidying up the patch and send it soon
Regards,
--
Matteo Croce
perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay
- Re: [PATCH 1/2] add option to set the offset of the archive, (continued)
- Re: [PATCH 1/2] add option to set the offset of the archive, Sergey Poznyakoff, 2024/10/31
- Re: [PATCH 1/2] add option to set the offset of the archive, Matteo Croce, 2024/10/31
- Re: [PATCH 1/2] add option to set the offset of the archive, Sergey Poznyakoff, 2024/10/31
- Re: [PATCH 1/2] add option to set the offset of the archive, Matteo Croce, 2024/10/31
- Re: [PATCH 1/2] add option to set the offset of the archive, Sergey Poznyakoff, 2024/10/31
- Re: [PATCH 1/2] add option to set the offset of the archive, Matteo Croce, 2024/10/31
Re: [PATCH 1/2] add option to set the offset of the archive, Sergey Poznyakoff, 2024/10/24