coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort dynamic linking overhead


From: Pádraig Brady
Subject: Re: sort dynamic linking overhead
Date: Mon, 9 Oct 2023 14:48:53 +0100
User-agent: Mozilla Thunderbird

On 08/10/2023 21:53, Pádraig Brady wrote:
On 08/10/2023 14:36, Pádraig Brady wrote:
On 07/10/2023 22:29, Paul Eggert wrote:
On 2023-10-07 04:42, Pádraig Brady wrote:

The auto linking is globally controlled with the --with-openssl
cofigure option, but you could build sort (and md5sum)
without that dependency with:

      ./configure ac_cv_lib_crypto_MD5=no

Thanks, I was thinking more along the lines that Bruno suggested, which
to continue to link to libcrypto, but do it with dlopen/dlsym in 'sort'
only when need_random is true.

It's not clear to me offhand whether this should be done entirely in
Coreutils, or whether we should add some Gnulib support to make it
easier to do this sort of lazier linking.

I was wondering if this was worth worrying about at all,
but it is a significant overhead that's worth improving.
To quantify the overhead I compared optimized builds,
with and without the above configure option, giving:

$ time seq 10000 | xargs -I'{}' src/sort /dev/null -k'{}'
real    0m7.009s
user    0m3.462s
sys     0m3.578s

$ time seq 10000 | xargs -I'{}' src/sort-lc /dev/null -k'{}'
real    0m12.950s
user    0m3.754s
sys     0m9.200s


So we should do something. Now dlopening libcrypto on demand
would work, but there may be better solutions.
sort doesn't have to use md5. It could use blake2 routines
already in coreutils to avoid the issue (and get some speed ups).
Alternatively it might use some other hash function.
For example see the other 128 bit functions compared at:
https://github.com/Cyan4973/xxHash

BTW there was mention of static linking as an option in this thread.
That's is an option to provide better speed an isolation for binaries,
however it's best left to the system builders to use this for their builds.
There can be security implications for prompt library updating,
and libcrypto is particularly sensitive in this regard.

Adding coreutils list...

So above we've demonstrated that sort dynamically loading libcrypto
does nearly double the startup time for the process.

Attached is a patch to use the coreutils reference blake2b hash instead
of the optimized libcrypto md5 routines.

    $ seq 1000000 > 1.txt

    $ time src/sort-md5-lc -R < 1.txt > /dev/null
    real        0m6.734s
    user        0m23.258s
    sys 0m0.047s

    $ time src/sort-blake2 -R < 1.txt > /dev/null
    real        0m7.215s
    user        0m25.683s
    sys 0m0.043s

    $ grep 'model name' /proc/cpuinfo | head -n1
    model name  : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
    $ rpm -q openssl-libs
    openssl-libs-3.0.9-2.fc38.x86_64

So while this avoids the startup overhead,
the reference blake2 routines are a little less efficient
than the optimized md5 libcrypto routines.

An incremental patch attached to use xxhash128 (0.8.2)
shows a good improvement (note avx2 being used on this cpu):

  $ time src/sort-xxh -R < 1.txt > /dev/null
  real  0m4.111s
  user  0m14.429s
  sys   0m0.058s

I'm not sure how best to avail of it though.
Perhaps embed, or maybe link statically if available?

cheers,
Pádraig

Attachment: sort-xxhash.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]