qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detec


From: Pádraig Brady
Subject: Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection
Date: Fri, 23 Oct 2015 12:15:07 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 22/10/15 20:47, Paolo Bonzini wrote:
> 
> 
> On 22/10/2015 19:39, Radim Krčmář wrote:
>> 2015-10-22 18:14+0200, Paolo Bonzini:
>>> On 22/10/2015 18:02, Eric Blake wrote:
>>>> I see a bug in there:
>>>
>>> Of course.  You shouldn't have told me what the bug was, I deserved
>>> to look for it myself. :)
>>
>> It rather seems that you don't want spoilers, :)
>>
>> I see two bugs now.
> 
> Me too. :)  But Rusty surely has some testcases in case he wants to
> adopt some of the ideas here. O:-)

For completeness this should address the bugs I think?

bool memeqzero4_paolo(const void *data, size_t length)
{
    const unsigned char *p = data;
    unsigned long word;

    if (!length)
        return true;

    /* Check len bytes not aligned on a word.  */
    while (__builtin_expect(length & (sizeof(word) - 1), 0)) {
        if (*p)
            return false;
        p++;
        length--;
        if (!length)
            return true;
    }

    /* Check up to 16 bytes a word at a time.  */
    for (;;) {
        memcpy(&word, p, sizeof(word));
        if (word)
            return false;
        p += sizeof(word);
        length -= sizeof(word);
        if (!length)
            return true;
        if (__builtin_expect(length & 15, 0) == 0)
            break;
    }

     /* Now we know that's zero, memcmp with self. */
     return memcmp(data, p, length) == 0;
}

compiled with gcc 5.1.1 -march=native -O2 on an i3-2310M
we get these timings:

bytes  1        8       16      512     65536
---------------------------------------------
Rusty: 10       28      59      114     6510
Paolo: 9        9       12      75      6495

It's also smaller, especially at -O3:

$ nm -S a.out | grep memeqzero4
... 000000000000005b t memeqzero4_paolo
... 0000000000000063 t memeqzero4_rusty
$ gcc -march=native -O3 memeqzero.c
$ nm -S a.out | grep memeqzero4
... 000000000000005b t memeqzero4_paolo
... 0000000000000133 t memeqzero4_rusty

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]