[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to optimize
From: |
Peter Lieven |
Subject: |
Re: [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to optimize find_next_bit() |
Date: |
Tue, 19 Mar 2013 20:40:50 +0100 |
Am 19.03.2013 um 17:49 schrieb Eric Blake <address@hidden>:
> On 03/15/2013 09:50 AM, Peter Lieven wrote:
>> this patch adds the usage of buffer_find_nonzero_offset()
>> to skip large areas of zeroes.
>>
>> compared to loop unrolling presented in an earlier
>> patch this adds another 50% performance benefit for
>> skipping large areas of zeroes. loop unrolling alone
>> added close to 100% speedup.
>>
>> Signed-off-by: Peter Lieven <address@hidden>
>> ---
>> util/bitops.c | 26 +++++++++++++++++++++++---
>> 1 file changed, 23 insertions(+), 3 deletions(-)
>
>> + while (size >= BITS_PER_LONG) {
>> + if ((tmp = *p)) {
>> + goto found_middle;
>> + }
>> + if (((uintptr_t) p) % sizeof(VECTYPE) == 0
>> + && size >= BITS_PER_BYTE * sizeof(VECTYPE)
>> + * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
>
> Another instance where a helper function to check for alignment would be
> nice. Except this time you have a BITS_PER_BYTE factor, so you would be
> calling something like buffer_can_use_vectors(buf, size / BITS_PER_BYTE)
>
>> + unsigned long tmp2 =
>> + buffer_find_nonzero_offset(p, ((size / BITS_PER_BYTE) &
>> + ~(BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR *
>> + sizeof(VECTYPE) - 1)));
>
> Type mismatch - buffer_find_nonzero_offset returns size_t, which isn't
> necessarily the same size as unsigned long. I'm not sure if it can bite
> you.
I will look into it.
>
>> + result += tmp2 * BITS_PER_BYTE;
>> + size -= tmp2 * BITS_PER_BYTE;
>> + p += tmp2 / sizeof(unsigned long);
>> + if (!size) {
>> + return result;
>> + }
>> + if (tmp2) {
>
> Do you really need this condition, or would it suffice to just
> 'continue;' the loop? Once buffer_find_nonzero_offset returns anything
> that leaves size as non-zero, we are guaranteed that the loop will goto
> found_middle without any further calls to buffer_find_nonzero_offset.
Note in all cases. It will do if the nonzero content is in the first
sizeof(unsigned long)
bytes. If not, buffer_find_nonzero_offset() is called again. It will return 0
because
in the first sizeof(VECTYPE)*BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
bytes is a non-zero byte. To avoid this I placed this check.
Peter
>
>> + if ((tmp = *p)) {
>> + goto found_middle;
>> + }
>> + }
>> }
>> + p++;
>> result += BITS_PER_LONG;
>> size -= BITS_PER_LONG;
>> }
>>
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
- [Qemu-devel] [PATCHv2 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 9/9] migration: use XBZRLE only after bulk stage, Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 8/9] migration: do not search dirty pages in bulk stage, Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to optimize find_next_bit(), Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible, Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 2/9] cutils: add a function to find non-zero content in a buffer, Peter Lieven, 2013/03/15
[Qemu-devel] [PATCHv2 7/9] migration: do not sent zero pages in bulk stage, Peter Lieven, 2013/03/15