[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] bitops: provide an inline implementation of fin
From: |
Aurelien Jarno |
Subject: |
Re: [Qemu-devel] [PATCH] bitops: provide an inline implementation of find_first_bit |
Date: |
Fri, 20 Jun 2014 10:48:06 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Thu, Jun 19, 2014 at 10:36:18AM +0200, Paolo Bonzini wrote:
> Il 22/12/2013 12:32, Aurelien Jarno ha scritto:
> >find_first_bit has started to be used heavily in TCG code. The current
> >implementation based on find_next_bit is not optimal and can't be
> >optimized be the compiler if the bit array has a fixed size, which is
> >the case most of the time.
>
> If you mean by fully unrolling the loop, that's right.
>
> However...
>
> >- return find_next_bit(addr, size, 0);
> >+ unsigned long result, tmp;
> >+
> >+ for (result = 0; result < size; result += BITS_PER_LONG) {
> >+ tmp = *addr++;
> >+ if (tmp) {
> >+ result += ctzl(tmp);
> >+ return result < size ? result : size;
> >+ }
> >+ }
> >+ /* Not found */
> >+ return size;
> > }
> >
> > /**
> >
>
> ... you probably want to limit this to bitmaps that are of constant
> size, and small enough that the compiler will unroll them.
>
> So it probably would be a good idea to add an
>
> if (!__builtin_constant_p(size) || size > 8 * BITS_PER_LONG)
> return find_next_bit(addr, size, 0);
>
> Not urgent since TCG is the only user of find_first_bit right now,
> but worth considering.
In practice on x86_64, this function takes 27 instructions in the
general case, and 18 instructions in the fixed case, even for big
sizes. I therefore think that checking if the size is constant is a good
idea, but we should not make any test on the size itself and trust the
compiler to correctly decide if the loop should be unrolled or not.
I will send a patch about that in the next days.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
address@hidden http://www.aurel32.net