Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop

From:	Stuart Brady
Subject:	Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
Date:	Fri, 23 Oct 2009 01:34:17 +0100
User-agent:	Mutt/1.5.13 (2006-08-11)

On Thu, Oct 15, 2009 at 11:14:52PM +0200, Aurelien Jarno wrote:
> @@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x)
>  
>  uint32_t HELPER(clz)(uint32_t x)
>  {
> -    int count;
> -    for (count = 32; x; count--)
> -        x >>= 1;
> -    return count;
> +    return clz32(x);
>  }
>  
>  int32_t HELPER(sdiv)(int32_t num, int32_t den)

Just a quick note that the implementation of clz, ctz and popcnt is
still listed in the TCG TODO list.  The last time I looked, I noticed
that quite a few architectures have clz/ctz instructions:

   http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html

For those that don't, I think a combination the following two hacks at
http://graphics.stanford.edu/~seander/bithacks.html could be used:

   'Round up to the next highest power of 2'
   'Counting bits set, in parallel'

With this, it should be possible to implement clz and ctz without too
many operations for both 32-bit and 64-bit integers, without requiring
floats, lookup tables or branches.  Of course, __builtin_clz() might
well do a better job...

BTW, it may be worth pointing out:

   B[4] = 0x0000ffff;
   B[3] = B[4] ^ (B[4] << 8) => 0x00ff00ff
   B[2] = B[3] ^ (B[3] << 4) => 0x0f0f0f0f
   B[1] = B[2] ^ (B[2] << 2) => 0x33333333
   B[0] = B[1] ^ (B[1] << 1) => 0x55555555

In reality, I wonder if five separate loads would be quicker, though.

Cheers,
-- 
Stuart Brady

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop, Aurelien Jarno, 2009/10/15
- Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop, Laurent Desnogues, 2009/10/18
- Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop, Stuart Brady <=
  - Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop, Aurelien Jarno, 2009/10/23
    - Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop, Stuart Brady, 2009/10/23
    - Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop, Aurelien Jarno, 2009/10/23

Prev by Date: Re: [Qemu-devel] CODING_STYLE and if blocks
Next by Date: [Qemu-devel] Re: [PATCH] tcg, tci: Add TCG and interpreter for bytecode (virtual machine)
Previous by thread: Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
Next by thread: Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
Index(es):
- Date
- Thread