On Wed, March 29, 2017 10:01, Joerg Schilling wrote:
Paul Eggert <address@hidden> wrote:
On 03/27/2017 07:02 AM, Carlo Alberto Ferraris wrote:
This is a PoC patch that improves archive creation performance at
least in certain configurations
What configuration performs poorly with sequential access? How much
improvement do you see with the patch, and why?
I doubt that such methods will help to speed up archiving. I did many
tests
with similar approaches with star since aprox. 1997 and I did never see
any
performance win on any modern OS.
Carlo's patch calls
posix_fadvise(fd, offset, len,
POSIX_FADV_WILLNEED|POSIX_FADV_SEQUENTIAL|POSIX_FADV_NOREUSE);
According to
http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_fadvise.html"The advice to be applied to the data is specified by the advice parameter
and may be one of the following values: [lists various POSIX_FADV_xxx
definitions]"
You can't bitwise OR several POSIX_FADV_xxx values together when calling
posix_fadvise(). Instead you would need to call posix_fadvise() three
times:
posix_fadvise(fd, offset, len, POSIX_FADV_WILLNEED);
posix_fadvise(fd, offset, len, POSIX_FADV_SEQUENTIAL);
posix_fadvise(fd, offset, len, POSIX_FADV_NOREUSE);
Looking at Linux include/uapi/linux/fadvise.h bears that out;
POSIX_FADV_NOREUSE is 5, the same value as POSIX_FADV_DONTNEED |
POSIX_FADV_RANDOM.
Whether or not the OS does anything with posix_fadvise() hints is up to
the OS. But I seem to remember reading that Linux uses a larger read-ahead
if told that a file will be read sequentially.
Both POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL probably won't do any
harm, since they match the way in which tar reads files.
For POSIX_FADV_NOREUSE, Linux currently treats that as a no-op, though a
patches was proposed a few years ago.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/mm/fadvise.c?id=refs/tags/v4.10.6
https://lwn.net/Articles/480930/
It might be a good idea to add an option to tar to use
POSIX_FADV_DONTNEED, since that could reduce tar's impact on other
processes (less filling the page cache with file data and evicting the
working set of other programs).