qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations


From: Peter Lieven
Subject: Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible
Date: Tue, 19 Mar 2013 17:14:40 +0100

Am 19.03.2013 um 17:08 schrieb Eric Blake <address@hidden>:

> On 03/15/2013 09:50 AM, Peter Lieven wrote:
>> performance gain on SSE2 is approx. 20-25%. altivec
>> is not tested. performance for unsigned long arithmetic
>> is unchanged.
>> 
>> Signed-off-by: Peter Lieven <address@hidden>
>> ---
>> util/cutils.c |    7 +++++++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/util/cutils.c b/util/cutils.c
>> index 857dd7d..00d98fb 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -190,6 +190,13 @@ size_t buffer_find_nonzero_offset(const void *buf, 
>> size_t len)
>>  */
>> bool buffer_is_zero(const void *buf, size_t len)
>> {
>> +    /* use vector optimized zero check if possible */
>> +    if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 
>> +          && len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> +             * sizeof(VECTYPE)) == 0) {
> 
> Is it worth factoring this check into something more reusable, by adding
> something like 'bool buffer_can_use_vectors(buf, len)' in patch 2/9?

good idea!

> 
>> +        return buffer_find_nonzero_offset(buf, len)==len;
> 
> Spaces around binary operators.
> 
> Is it worth rewriting this function into a simpler:
> 
> check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs until
> we are aligned
> check buffer_find_nonzero_offset on the aligned middle
> check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs at tail
> 
> instead of having two instances of code that can loop over the entire
> buffer?  Otherwise, searching for zeros on an unaligned buffer will
> remain slower, even though the bulk of the search could still benefit
> from the vector operations.

i do not think that this is necessary as in all cases I know the buffers are 
aligned properly.

a) ram pages (4k)
b) block migration pages (1M)
c) bdrv sectors (512)

At least on x86_64 they have all 16-byte aligned addresses. 

It would just add more branches to the case where the optimized version can be 
used maybe
making it slower.

Peter

> 
>> +    }
>> +
>>     /*
>>      * Use long as the biggest available internal data type that fits into 
>> the
>>      * CPU register and unroll the loop to smooth out the effect of memory
> 
> Your patch results in C99 declarations after statements; while we
> require C99, I'm not sure if qemu prefers to stick to the C89 style of
> declarations before statements.

the code was slower if I made the checks afterwards.

Peter

> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]