qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detec


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection
Date: Fri, 23 Oct 2015 13:14:47 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0


On 23/10/2015 13:12, Pádraig Brady wrote:
> On 22/10/15 20:47, Paolo Bonzini wrote:
>>
>>
>> On 22/10/2015 19:39, Radim Krčmář wrote:
>>> 2015-10-22 18:14+0200, Paolo Bonzini:
>>>> On 22/10/2015 18:02, Eric Blake wrote:
>>>>> I see a bug in there:
>>>>
>>>> Of course.  You shouldn't have told me what the bug was, I deserved
>>>> to look for it myself. :)
>>>
>>> It rather seems that you don't want spoilers, :)
>>>
>>> I see two bugs now.
>>
>> Me too. :)  But Rusty surely has some testcases in case he wants to
>> adopt some of the ideas here. O:-)
> 
> For completeness this should address the bugs I think?

Yes, thanks! :D

Paolo

> bool memeqzero4_paolo(const void *data, size_t length)
> {
>     const unsigned char *p = data;
>     unsigned long word;
> 
>     if (!length)
>         return true;
> 
>     /* Check len bytes not aligned on a word.  */
>     while (__builtin_expect(length & (sizeof(word) - 1), 0)) {
>         if (*p)
>             return false;
>         p++;
>         length--;
>         if (!length)
>             return true;
>     }
> 
>     /* Check up to 16 bytes a word at a time.  */
>     for (;;) {
>         memcpy(&word, p, sizeof(word));
>         if (word)
>             return false;
>         p += sizeof(word);
>         length -= sizeof(word);
>         if (!length)
>             return true;
>         if (__builtin_expect(length & 15, 0) == 0)
>             break;
>     }
> 
>      /* Now we know that's zero, memcmp with self. */
>      return memcmp(data, p, length) == 0;
> }
> 
> compiled with gcc 5.1.1 -march=native -O2 on an i3-2310M
> we get these timings:
> 
> bytes  1      8       16      512     65536
> ---------------------------------------------
> Rusty: 10     28      59      114     6510
> Paolo: 9      9       12      75      6495
> 
> It's also smaller, especially at -O3:
> 
> $ nm -S a.out | grep memeqzero4
> ... 000000000000005b t memeqzero4_paolo
> ... 0000000000000063 t memeqzero4_rusty
> $ gcc -march=native -O3 memeqzero.c
> $ nm -S a.out | grep memeqzero4
> ... 000000000000005b t memeqzero4_paolo
> ... 0000000000000133 t memeqzero4_rusty
> 
> cheers,
> Pádraig.
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]