Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations

From:	Peter Lieven
Subject:	Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible
Date:	Tue, 19 Mar 2013 17:14:40 +0100

Am 19.03.2013 um 17:08 schrieb Eric Blake <address@hidden>:

> On 03/15/2013 09:50 AM, Peter Lieven wrote:
>> performance gain on SSE2 is approx. 20-25%. altivec
>> is not tested. performance for unsigned long arithmetic
>> is unchanged.
>> 
>> Signed-off-by: Peter Lieven <address@hidden>
>> ---
>> util/cutils.c |    7 +++++++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/util/cutils.c b/util/cutils.c
>> index 857dd7d..00d98fb 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -190,6 +190,13 @@ size_t buffer_find_nonzero_offset(const void *buf, 
>> size_t len)
>>  */
>> bool buffer_is_zero(const void *buf, size_t len)
>> {
>> +    /* use vector optimized zero check if possible */
>> +    if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 
>> +          && len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> +             * sizeof(VECTYPE)) == 0) {
> 
> Is it worth factoring this check into something more reusable, by adding
> something like 'bool buffer_can_use_vectors(buf, len)' in patch 2/9?

good idea!

> 
>> +        return buffer_find_nonzero_offset(buf, len)==len;
> 
> Spaces around binary operators.
> 
> Is it worth rewriting this function into a simpler:
> 
> check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs until
> we are aligned
> check buffer_find_nonzero_offset on the aligned middle
> check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs at tail
> 
> instead of having two instances of code that can loop over the entire
> buffer?  Otherwise, searching for zeros on an unaligned buffer will
> remain slower, even though the bulk of the search could still benefit
> from the vector operations.

i do not think that this is necessary as in all cases I know the buffers are 
aligned properly.

a) ram pages (4k)
b) block migration pages (1M)
c) bdrv sectors (512)

At least on x86_64 they have all 16-byte aligned addresses. 

It would just add more branches to the case where the optimized version can be 
used maybe
making it slower.

Peter

> 
>> +    }
>> +
>>     /*
>>      * Use long as the biggest available internal data type that fits into 
>> the
>>      * CPU register and unroll the loop to smooth out the effect of memory
> 
> Your patch results in C99 declarations after statements; while we
> require C99, I'm not sure if qemu prefers to stick to the C89 style of
> declarations before statements.

the code was slower if I made the checks afterwards.

Peter

> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCHv2 9/9] migration: use XBZRLE only after bulk stage, (continued)
- [Qemu-devel] [PATCHv2 9/9] migration: use XBZRLE only after bulk stage, Peter Lieven, 2013/03/15
  - Re: [Qemu-devel] [PATCHv2 9/9] migration: use XBZRLE only after bulk stage, Eric Blake, 2013/03/19
- [Qemu-devel] [PATCHv2 8/9] migration: do not search dirty pages in bulk stage, Peter Lieven, 2013/03/15
  - Re: [Qemu-devel] [PATCHv2 8/9] migration: do not search dirty pages in bulk stage, Eric Blake, 2013/03/19
    - Re: [Qemu-devel] [PATCHv2 8/9] migration: do not search dirty pages in bulk stage, Peter Lieven, 2013/03/19
- [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to optimize find_next_bit(), Peter Lieven, 2013/03/15
  - Re: [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to optimize find_next_bit(), Eric Blake, 2013/03/19
    - Re: [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to optimize find_next_bit(), Peter Lieven, 2013/03/19
- [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible, Peter Lieven, 2013/03/15
  - Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible, Eric Blake, 2013/03/19
    - Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible, Peter Lieven <=
    - Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible, Peter Lieven, 2013/03/19
- [Qemu-devel] [PATCHv2 2/9] cutils: add a function to find non-zero content in a buffer, Peter Lieven, 2013/03/15
  - Re: [Qemu-devel] [PATCHv2 2/9] cutils: add a function to find non-zero content in a buffer, Eric Blake, 2013/03/19
    - Re: [Qemu-devel] [PATCHv2 2/9] cutils: add a function to find non-zero content in a buffer, Peter Lieven, 2013/03/19
    - Re: [Qemu-devel] [PATCHv2 2/9] cutils: add a function to find non-zero content in a buffer, Eric Blake, 2013/03/19
    - Re: [Qemu-devel] [PATCHv2 2/9] cutils: add a function to find non-zero content in a buffer, Peter Lieven, 2013/03/19
- [Qemu-devel] [PATCHv2 7/9] migration: do not sent zero pages in bulk stage, Peter Lieven, 2013/03/15
  - Re: [Qemu-devel] [PATCHv2 7/9] migration: do not sent zero pages in bulk stage, Eric Blake, 2013/03/19
    - Re: [Qemu-devel] [PATCHv2 7/9] migration: do not sent zero pages in bulk stage, Peter Lieven, 2013/03/19
- [Qemu-devel] [PATCHv2 5/9] migration: search for zero instead of dup pages, Peter Lieven, 2013/03/15

Prev by Date: Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible
Next by Date: Re: [Qemu-devel] [PATCHv2 2/9] cutils: add a function to find non-zero content in a buffer
Previous by thread: Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible
Next by thread: Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible
Index(es):
- Date
- Thread