coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature request for coreutils: b2sum


From: Taylor R Campbell
Subject: Re: feature request for coreutils: b2sum
Date: Mon, 8 Jun 2015 21:17:11 +0000
User-agent: IMAIL/1.21; Edwin/3.116; MIT-Scheme/9.1.99

   Date: Mon, 08 Jun 2015 21:24:30 +0100
   From: Padraig Brady <address@hidden>

   On 08/06/15 21:08, Taylor R Campbell wrote:
   > Zooko asked me to send the following timings of portable BLAKE2 C code
   > versus the hand-optimized assembly for MD5 and portable C for SHA-256
   > that one finds in OpenSSL 1.0.1k, computed on a 1.2 GHz Freescale
   > i.MX6 CPU (on different file, from /dev/urandom, of the same size as
   > Zooko reported timings for, 1073741824 bytes):

   Questions...

You probably shouldn't read too much into this crude measurement.

Here is a much more precise performance comparison, closer to what you
will find in SUPERCOP (<http://bench.cr.yp.to/>, which is where you
should look for high-quality performance comparisons of crypto
algorithms):

http://mumble.net/~campbell/tmp/blake2.imx6

The first number on each line is the size of the message in bytes.
The remaining numbers are nanoseconds per byte, measured by
clock_gettime(CLOCK_MONOTONIC) before and after computing the hash,
averaged over 16 trials.  The +1 means the input buffer was unaligned.

The BLAKE2 code, and timing code, for those data are at

http://mumble.net/~campbell/hg/blake2

with the MD5 and SHA-256 timing code adapted slightly to use OpenSSL's
API instead of the BSD libc API for MD5 and SHA-256.

(Yes, that code should use the ARM cycle counter instead of
clock_gettime(CLOCK_MONOTONIC).  Patches welcome!)

The rest of this message is about the less precise measurements of the
code at <https://blake2.net/> previously under discussion here.

   Does the file fit in cache?

Yes.  The machine has 4 GB of RAM.

   A file about quarter the size would be enough for this test I think.

Yes.  I used 1073741824 bytes because that is what zooko had used.

   The md5sum, sha256sum, and sha512sum below were from coreutils
   ./configured --with-openssl=yes ?

On second thought, I'm not sure: md5sum and sha256sum are not linked
against libcrypto, so perhaps not.  It was from the Debian jessie
coreutils 8.23-4 package for armhf.

On the other hand, I get about the same timings from `openssl md5' and
`openssl sha256', so perhaps md5sum and sha256sum were just statically
linked against OpenSSL.

   > $ time md5sum randfile.0
   > 7af160fa500c6ad20be1c8119c9141f8  randfile.0
   > 
   > real    0m9.132s
   > user    0m6.600s
   > sys     0m2.530s

   I presume this was precached?

Yes.  I warmed the cache by running each program twice first.

   > $ time b2sum randfile.0
   > 
ea2c77e755d0f5c84e9fff444cd6ce83a566b134d43e4fe37ed53886e0ca5c7e6141968498d5d765c4190e4b567c437337e8e57ef5ba9306cc11db29a4b9e987
 randfile.0
   > 
   > real    0m48.012s
   > user    0m46.070s
   > sys     0m1.900s

   I presume the above was for sha512sum

This was BLAKE2b, i.e. the 512-bit BLAKE2 hash function, which is the
default for b2sum.  I copied zooko's invocations verbatim.

   > $ time b2sum -a blake2sp randfile.0
   > 2886c0adfd613381d02f18a8ed18527c98d88b115a974e61e030fb914118bd0d randfile.0
   > 
   > real    0m9.880s
   > user    0m23.610s
   > sys     0m3.260s

   So this b2sum implementation is multithreaded
   and has about the same total computational cost as sha256sum?

It appears to be multithreaded with OpenMP.  I'm using more or less
the same BLAKE2 code that zooko reported from <https://blake2.net/>,
specifically blake2_code_20150529.zip.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]