bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] add option to set the offset of the archive


From: Matteo Croce
Subject: Re: [PATCH 1/2] add option to set the offset of the archive
Date: Fri, 25 Oct 2024 21:17:58 +0200

Il giorno ven 25 ott 2024 alle ore 19:22 Matteo Croce
<technoboy85@gmail.com> ha scritto:
>
> Il giorno ven 25 ott 2024 alle ore 07:51 Sergey Poznyakoff
> <gray@gnu.org.ua> ha scritto:
> >
> > Hi Matteo,
> >
> > > I'm developing a feature which aligns the file content to the
> > > filesystem boundary by adding a PAX comment header.
> > > This, along with, the --offset option will allow to extract a Debian
> > > package (which is an an archive embedding a tar one),
> > > by using under Linux the FICLONERANGE ioctl(), and extract the archive
> > > without doing any IO.
> >
> > The idea is interesting.  However, I believe that "without doing any IO"
> > is an exaggeration.  Indeed, FICLONERANGE allows you to make contents
> > of one file descriptor appear under another file descriptor and that
> > doesn't require additional I/O.  But *reading* from that other descriptor
> > would mean I/O activity, as usual.  So, it seems that this will have the
> > same effect as running
> >
> >   ar p DEBFILE | tar tfJ -
> >
> > (modulo compression option).  Am I missing something?
> >
>
> Hi Sergey,
>
> As you're interested, I'm providing more details.
> During the download phase, the one done by apt, I pipe the archives
> into a tool named `debcow`.
> This tool adds a member before the data.tar.xz to align the data
> starting offset, uncompresses it and adds comments in the PAX header
> to align the file data:
>
> $ ar -tO coreutils_9.1-1_amd64.deb
> debian-binary 0x44
> control.tar.xz 0x84
> data.tar.xz 0x1c3c
> $ debcow <coreutils_9.1-1_amd64.deb >coreutils_9.1-1_amd64_cow.deb
> $ ar -tO coreutils_9.1-1_amd64_cow.deb
> debian-binary 0x44
> control.tar.xz 0x84
> _data-pad 0x1c3c
> data.tar 0x2000
>
> Once I have the data.tar and the files aligned on the block boundary,
> I extract the files with FICLONERANGE instead of the usual
> read()/write() loop:
>
> $ { dd status=none bs=8k skip=1 count=0 && strace -eopenat,ioctl tar
> -x -C root/ --reflink ; } <coreutils_9.1-1_amd64_cow.deb
> openat(3, "./bin/cat",
> O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
> ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
> src_offset=4096, src_length=45056, dest_offset=0}) = 0
> openat(3, "./bin/chgrp",
> O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
> ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
> src_offset=53248, src_length=69632, dest_offset=0}) = 0
> openat(3, "./bin/chmod",
> O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
> ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
> src_offset=126976, src_length=65536, dest_offset=0}) = 0
> openat(3, "./bin/chown",
> O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4
> ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0,
> src_offset=196608, src_length=73728, dest_offset=0}) = 0
>
> yes, some I/O is still needed, but only to read the file entries from
> the tar archive, the file content can be skipped.
> So when extracting bigger files, the improvement can be substantial.
>
> Regards,
> --
> Matteo Croce
>
> perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay

This is a test done on a linux source tree:

$ time tar xf linux-6.11.5.tar

real    0m13.375s
user    0m0.479s
sys     0m3.094s

$ time tar xf linux-6.11.5.tar --reflink

real    0m3.255s
user    0m0.414s
sys     0m1.722s

The same command analyzed with perf:

$ perf stat tar xf linux-6.11.5.tar

 Performance counter stats for 'tar xf linux-6.11.5.tar':

          4,553.71 msec task-clock:u                     #    0.334
CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               168      page-faults:u                    #   36.893 /sec
   <not supported>      cycles:u
   <not supported>      instructions:u
   <not supported>      branches:u
   <not supported>      branch-misses:u

      13.614911389 seconds time elapsed

       0.473461000 seconds user
       3.305497000 seconds sys


$ perf stat tar xf linux-6.11.5.tar --reflink

 Performance counter stats for 'tar xf linux-6.11.5.tar --reflink':

          2,298.44 msec task-clock:u                     #    0.663
CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               167      page-faults:u                    #   72.658 /sec
   <not supported>      cycles:u
   <not supported>      instructions:u
   <not supported>      branches:u
   <not supported>      branch-misses:u

       3.464235818 seconds time elapsed

       0.413364000 seconds user
       1.747194000 seconds sys

I'm tidying up the patch and send it soon

Regards,
-- 
Matteo Croce

perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay



reply via email to

[Prev in Thread] Current Thread [Next in Thread]