[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC] find_next_bit optimizations
From: |
Peter Lieven |
Subject: |
Re: [Qemu-devel] [RFC] find_next_bit optimizations |
Date: |
Mon, 11 Mar 2013 15:22:16 +0100 |
Am 11.03.2013 um 15:14 schrieb Paolo Bonzini <address@hidden>:
> Il 11/03/2013 14:44, Peter Lieven ha scritto:
>> Hi,
>>
>> I ever since had a few VMs which are very hard to migrate because of a
>> lot of memory I/O. I found that finding the next dirty bit
>> seemed to be one of the culprits (apart from removing locking which
>> Paolo is working on).
>>
>> I have to following proposal which seems to help a lot in my case. Just
>> wanted to have some feedback.
>> I applied the same unrolling idea like in buffer_is_zero().
>>
>> Peter
>>
>> --- a/util/bitops.c
>> +++ b/util/bitops.c
>> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long
>> *addr, unsigned long size,
>> const unsigned long *p = addr + BITOP_WORD(offset);
>> unsigned long result = offset & ~(BITS_PER_LONG-1);
>> unsigned long tmp;
>> + unsigned long d0,d1,d2,d3;
>>
>> if (offset >= size) {
>> return size;
>> }
>> size -= result;
>> - offset %= BITS_PER_LONG;
>> + offset &= (BITS_PER_LONG-1);
>> if (offset) {
>> tmp = *(p++);
>> tmp &= (~0UL << offset);
>> @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long
>> *addr, unsigned long size,
>> result += BITS_PER_LONG;
>> }
>> while (size & ~(BITS_PER_LONG-1)) {
>> + while (!(size & (4*BITS_PER_LONG-1))) {
>
> This really means
>
> if (!(size & (4*BITS_PER_LONG-1))) {
> while (1) {
> ...
> }
> }
>
> because the subtraction will not change the result of the "while" loop
> condition.
Are you sure? The above is working nicely for me (wondering why ;-))
I think !(size & (4*BITS_PER_LONG-1)) is the same as what you
propose. If size & (4*BITS_PER_LONG-1) is not zero its not dividable
by 4*BITS_PER_LONG. But it see it might be a problem for size == 0.
>
> What you want is probably "while (size & ~(4*BITS_PER_LONG-1))", which
> in turn means "while (size >= 4*BITS_PER_LONG).
>
> Please change both while loops to use a ">=" condition, it's easier to read.
Good idea, its easier to understand.
Has anyone evidence if unrolling 4 longs is optimal on today processors?
I just chooses 4 longs because it was the same in buffer_is_zero.
Peter
>
> Paolo
>
>> + d0 = *p;
>> + d1 = *(p+1);
>> + d2 = *(p+2);
>> + d3 = *(p+3);
>> + if (d0 || d1 || d2 || d3) {
>> + break;
>> + }
>> + p+=4;
>> + result += 4*BITS_PER_LONG;
>> + size -= 4*BITS_PER_LONG;
>> + }
>> if ((tmp = *(p++))) {
>> goto found_middle;
>> }
>
- [Qemu-devel] [RFC] find_next_bit optimizations, Peter Lieven, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Peter Maydell, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Paolo Bonzini, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations,
Peter Lieven <=
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Peter Lieven, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Paolo Bonzini, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Peter Lieven, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Paolo Bonzini, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Peter Lieven, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Paolo Bonzini, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, ronnie sahlberg, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Paolo Bonzini, 2013/03/11
- Re: [Qemu-devel] [RFC] find_next_bit optimizations, Peter Lieven, 2013/03/11
- [Qemu-devel] [PATCH] bitops: unroll while loop in find_next_bit()., Peter Lieven, 2013/03/12