bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Improve sha*sum speed


From: Loïc Le Loarer
Subject: Re: [PATCH] Improve sha*sum speed
Date: Mon, 12 Sep 2011 17:18:51 +0200

Hi,

In fact, I have access to a Pentium(R) 4 CPU 2.80GHz, which is much
slower in fact so I have replaced the 1G zero bytes by 100M zero
bytes, and tested only the 32 bit binaries of course and you can find
the results attached.

It seems that the impact of my patches are much more convincing on
this architecture.

If anyone can test on other systems, it would be good.

Thanks in advance,
Best regards
Loïc

2011/9/12 Loïc Le Loarer <address@hidden>:
> Hi,
>
> Here is my latest results and patch. Please find the patches to
> sha1.c, sha256.c and sh512.c attached and the "time" of the resulting
> binaries in sha_benchs.log. For all binaries, in 64 and 32 bits modes
> (.m32), I run 3 times the command "\time sha*sum zero1G" where zero1G
> is a 10^9 bytes file created by the command:
> dd if=/dev/zero of=zero1G count=1 bs=1 seek=$(( 1000 * 1000 * 1000 - 1 ))
>
> The compilation of coreutils was done using the command
> make CFLAGS="-O3"
> for 64 bit version and
> make CFLAGS="-m32 -O3"
> for 32 bit version.
>
> gcc is version 4.4.5 (Ubuntu 10.10)
>
> My CPU is a Sandy Bridge @2.5GHz.
>
> For sha1, the result is very close to Linus' version for git.
>
> I think it could be a good idea to include thoses patches to improve
> the C versions, it is probably close to the best it can be done in
> "pure" C.
>
> To improve further, assembly with or without SSE could be done in a second 
> pass.
>
> What to you think of that ?
>
> I don't have a GCC farm access yet, so I can only test on my system for now.
>
> Best regards.
> Loïc
>
> 2011/9/6 Pádraig Brady <address@hidden>:
>> On 09/06/2011 02:25 PM, Loďc Le Loarer wrote:
>>> Hi Pádraig,
>>>
>>> Thank you for your answer.
>>>
>>> 2011/9/6 Pádraig Brady <address@hidden <mailto:address@hidden>>
>>>
>>>     A few general points.
>>>     You essentially used Linus' code (albeit by
>>>     very helpfully isolating the significant differences).
>>>     It might be easier/required to just include it in gnulib?
>>>     There are a few files in gnulib that are not copyright of the FSF,
>>>     so would Nicolas and Linus need to assign copyright?
>>>
>>>
>>> Yes, this is what I did. I don't thing that including Linus' is easier as 
>>> the functions have a different prototype. Also, sha1, sha256 and sha512 
>>> share the same structure in gnulib, changing one without changing the other 
>>> would be weird. But if you thing it is required, I have not problem with 
>>> that.
>>
>> Ok, let's just use your patches to gnulib so.
>> The techniques were fairly generic anyway.
>>
>>>
>>> By the way, I have done a test on sha512 and I have improved the speed on 
>>> the same 1Gb zero file from 4.5 to 3.9s. Please find the patch attached. So 
>>> I thing that using the same technics, we could improve all sha's speed.
>>>
>>>     For performance testing I've found gcc generates
>>>     much more deterministic results with a -march
>>>     as close to native as possible or otherwise
>>>     the code is very susceptible to alignment issues etc.
>>>     Your compiler supports -march=native.
>>>     Note also gcc 4.6 has much better support for your sandy bridge CPU,
>>>     either with -march=native or -march=corei7-avx
>>>
>>>
>>> I tried using gcc-4.6.1 (I recompiled it under my ubuntu 10.10) but I 
>>> couldn't see any differences. For me, using any combination of 
>>> -march=native or not and gcc 4.4.5 or 4.6.1 doesn't make a difference, all 
>>> the times are in the measurement margin.
>>
>> OK that at least confirms the improvement is fairly deterministic.
>>
>>>
>>>     As for the SSE version, I would also like to see that included,
>>>     given the proportion of hardware supporting that these days.
>>>     I previously noticed a coreutils SSE2 patch here:
>>>     http://www.arctic.org/~dean/crypto/sha1.html 
>>> <http://www.arctic.org/%7Edean/crypto/sha1.html>
>>>     Though we'd probably need some runtime SSE detection to include that.
>>>
>>>
>>> Ok, I could try to work on this. The real problem is to test that 
>>> compilation and SSE detection is done correctly on several platform. I only 
>>> have access to a few x86 machines, what is the usual way to test more 
>>> platforms ?
>>
>> It would probably be best to get an account on the GCC compile farm.
>> http://gcc.gnu.org/wiki/CompileFarm
>>
>> cheers,
>> Pádraig.
>>
>
>
>
> --
> Loïc
>



-- 
Loïc

Attachment: sha_benchs_p4.log
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]