qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization


From: Li, Liang Z
Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization
Date: Thu, 12 Nov 2015 08:53:32 +0000

> On 12/11/2015 03:49, Li, Liang Z wrote:
> > I am very surprised about the live migration performance  result when
> > I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to
> > check the zero pages.
> 
> What code were you using?  Remember I suggested using only unsigned long
> checks, like
> 
>       unsigned long *p = ...
>       if (p[0] || p[1] || p[2] || p[3]
>           || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0)
>               return BUFFER_NOT_ZERO;
>       else
>               return BUFFER_ZERO;
> 



I use the following code:


bool memeqzero4_paolo(const void *data, size_t length)
{
    const unsigned char *p = data;
    unsigned long word;

    if (!length)
        return true;

    /* Check len bytes not aligned on a word.  */
    while (__builtin_expect(length & (sizeof(word) - 1), 0)) {
        if (*p)
            return false;
        p++;
        length--;
        if (!length)
            return true;
    }

    /* Check up to 16 bytes a word at a time.  */
    for (;;) {
        memcpy(&word, p, sizeof(word));
        if (word)
            return false;
        p += sizeof(word);
        length -= sizeof(word);
        if (!length)
            return true;
        if (__builtin_expect(length & 15, 0) == 0)
            break;
    }

     /* Now we know that's zero, memcmp with self. */
     return memcmp(data, p, length) == 0;
}

> > The total live migration time increased about
> > 8%!   Not decreased.  Although in the unit test your '
> > memeqzero4_paolo'  has better performance, any idea?
> 
> You only tested the case of zero pages.  But real pages usually are not zero,
> even if they have a few zero bytes at the beginning.  It's very important to
> optimize the initial check before the memcmp call.
> 

In the unit test, I only test zero pages too, and the performance of  
'memeqzero4_paolo' is better.
But when merged into QEMU, it caused performance drop. Why?

> Paolo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]