bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Use posix_fadvise to improve archive creation performance


From: Carlo Alberto Ferraris
Subject: Re: [Bug-tar] Use posix_fadvise to improve archive creation performance
Date: Sun, 7 May 2017 15:18:12 +0900

Paul,
any chance of having this pulled in? To recap, this simply uses posix_fadvise to provide a hint to the OS that we’re going to perform a sequential read of the source files when creating an archive.
In our testing on linux 4.4 when creating a tar archive with source files on a ext4 filesystem (on a SAN volume) this patch doubles tar's throughput. This is because when linux is provided the FADV_SEQUENTIAL hint it doubles the readahead on the underlying block device.

Any feedback is welcome.

Carlo

On Apr 13, 2017, at 1:32 PM, Carlo Alberto Ferraris <address@hidden> wrote:

Paul,
friendly ping.

Carlo

On Apr 5, 2017, at 10:16 AM, Carlo Alberto Ferraris <address@hidden> wrote:

Just as a comment about why in my patch I use len explicitly instead of 0: it’s to workaround a bug in linux versions <2.6.6.

> In kernels before 2.6.6, if len was specified as 0, then this was interpreted 
> literally as "zero bytes", rather than as meaning “all bytes through to the 
> end of the file”. 

Since 0 means is supposed to mean “until the end of file”, passing explicitly the length of the file (that we already have) should be semantically the same.

Carlo

On Apr 4, 2017, at 10:14 PM, Mark <address@hidden> wrote:

On Mon, April 3, 2017 03:17, Paul Eggert wrote:
I've lost context. I prefer not having this depend on an environment
variable.

Can't the filesystem in question be fixed to have decent performance in
the typical case where applications access files sequentially? It's not
like 'tar' is a special case. I'd hate to have to modify lots of
programs just to work around a lame filesystem.

I think you're confusing two things:
- Carlo's patch
- The suggestion to allow the user to tell tar to use POSIX_FADV_NOREUSE
and/or POSIX_FADV_DONTNEED. In certain scenarios one, both or neither of
those could perform best. [On Linux POSIX_FADV_NOREUSE is currently a
no-op.]

Let's ignore the second point for now.

Carlo's patch at
https://github.com/CAFxX/tar/commit/8b3ccb099c6ddf9f03d12d1f7c433c7927b964d5
uses
 posix_fadvise(fd, offset, len, POSIX_FADV_SEQUENTIAL);
 posix_fadvise(fd, offset, len, POSIX_FADV_WILLNEED);
to give a hint to the OS/filesystem about how the file will be accessed.
There shouldn't be any down-side to doing that. On Linux for example,
POSIX_FADV_SEQUENTIAL causes the filesystem read-ahead amount to be
doubled.

I'm not qualified to say whether the patch should be committed as-is, but
the principle is sound. [I might choose a different name for the
prefetch() function though.]






reply via email to

[Prev in Thread] Current Thread [Next in Thread]