qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations


From: Peter Lieven
Subject: Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible
Date: Tue, 19 Mar 2013 20:44:37 +0100

Am 19.03.2013 um 17:08 schrieb Eric Blake <address@hidden>:

> On 03/15/2013 09:50 AM, Peter Lieven wrote:
>> performance gain on SSE2 is approx. 20-25%. altivec
>> is not tested. performance for unsigned long arithmetic
>> is unchanged.
>> 
>> Signed-off-by: Peter Lieven <address@hidden>
>> ---
>> util/cutils.c |    7 +++++++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/util/cutils.c b/util/cutils.c
>> index 857dd7d..00d98fb 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -190,6 +190,13 @@ size_t buffer_find_nonzero_offset(const void *buf, 
>> size_t len)
>>  */
>> bool buffer_is_zero(const void *buf, size_t len)
>> {
>> +    /* use vector optimized zero check if possible */
>> +    if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 
>> +          && len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> +             * sizeof(VECTYPE)) == 0) {
> 
> Is it worth factoring this check into something more reusable, by adding
> something like 'bool buffer_can_use_vectors(buf, len)' in patch 2/9?
> 
>> +        return buffer_find_nonzero_offset(buf, len)==len;
> 
> Spaces around binary operators.
> 
> Is it worth rewriting this function into a simpler:
> 
> check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs until
> we are aligned
> check buffer_find_nonzero_offset on the aligned middle
> check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs at tail
> 
> instead of having two instances of code that can loop over the entire
> buffer?  Otherwise, searching for zeros on an unaligned buffer will
> remain slower, even though the bulk of the search could still benefit
> from the vector operations.
> 
>> +    }
>> +
>>     /*
>>      * Use long as the biggest available internal data type that fits into 
>> the
>>      * CPU register and unroll the loop to smooth out the effect of memory
> 
> Your patch results in C99 declarations after statements; while we
> require C99, I'm not sure if qemu prefers to stick to the C89 style of
> declarations before statements.

See my comments to patch 7/9 regarding is_zero_page(). I will move the
call to the vector optimized version after the variable definitions. The
speed concern at that point was mainline to use it for zero page checking.

Peter

> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]