[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Possible ppc comparision optimisation
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] Possible ppc comparision optimisation |
Date: |
Wed, 08 May 2013 18:16:43 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4 |
Il 08/05/2013 17:44, Torbjorn Granlund ha scritto:
> Paolo Bonzini <address@hidden> writes:
>
> I think that would be faster on 32-bit hosts, truncs are cheap.
>
> And slower perhaps on 64-bit hosts, at least for operations where
> additional explicit trunctation will be needed (such as before
> comparisions and after right shifts).
>
> > There could be a disadvantage of this compared to the old code, since
> > this has a chained algebraic dependency, while the old code's many
> > instructions might have been more independent.
>
> What about these alternatives:
>
> setcond LT, t0, arg0, arg1
> setcond EQ, t1, arg0, arg1
> trunc s0, t0
> trunc s1, t1
> shli s0, s0, 1 ; s0 = (arg0 < arg1) ? 2 : 0
> subi s1, s1, 2 ; s1 = (arg0 != arg1) ? -2 : -1
> sub s0, s0, s1 ; < 4 == 1 > 2
> shli s0, s0, 1 ; < 8 == 2 > 4
>
> =======
>
> setcond LT, t0, arg0, arg1
> setcond NE, t1, arg0, arg1
> trunc s0, t0
> trunc s1, t1
> add s0, s0, s1 ; < 2 == 0 > 1
> movi s1, 1
> add s0, s0, s1 ; < 3 == 1 > 2
> shl s1, s1, s0 ; < 8 == 2 > 4
>
> Surely there are many alternative forms.
> Is your aim to add micro-parallelism?
Yes, I think in this respect I think the first one is better. The
second could be three instructions on machines that have a set-nth-bit
instruction _and_ a zero register, but I'm not sure they exist...
> (Your sequences look a bit curious. Did you use a super-optimiser?)
No, but I am attracted to these curious sequences from my previous life
working on compilers. :) I know your superoptimizer and, in fact, we
both worked on some parts of GCC (optimization of conditional
branches/stores), just 20 years apart.
The second is actually not too curious after you look at it for a while,
it is a variant of the usual (x > y) + (x >= y) trick used to generate a
0/1/2 result. The first I found by trial and error based on yours; it
is basically (x < y) * 2 - (x == y) + 2, with some reordering to get
parallelism and avoid the need for subfi-like instructions.
Paolo