qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/7] tcg: conditional set and move opcodes


From: malc
Subject: Re: [Qemu-devel] [PATCH 0/7] tcg: conditional set and move opcodes
Date: Thu, 17 Dec 2009 20:47:17 +0300 (MSK)

On Thu, 17 Dec 2009, Richard Henderson wrote:

> On 12/17/2009 07:32 AM, malc wrote:
> > > These new opcodes are considered "required" by the backend,
> > > because expanding them at the tcg level breaks the basic block.
> > > There might be some way to emulate within tcg internals, but
> > > that doesn't seem worthwhile, as essentially all hosts have
> > > some form of support for these.
> ..
> >   c. Historically things like that were made conditional with
> >      a generic fallback (bswap, neg, not, rot, etc)
> 
> I answered this one above.  A generic fallback would break the
> basic block, which would break TCGs simple register allocation.

Urgh.. I really hate implementing those xxxx2 ops.

See for example this lovely thread:
http://www.archivum.info/address@hidden/2008-06/00306/%5BQemu-devel%5D_%5B4705%5D_Fix_div%5Bu%5D2.

> 
> >   b. Documentation for movcond has a typo, t0 is assigned not t1
> 
> Oops.  Will fix.
> 
> >   d. Documentation for setcond2 is missing
> 
> Ah, I see that brcond2 is missing as well; I'll fix that too.
> 
> > It would also be interesting to learn what impact adding those two
> > has on performance, any results?
> 
> Hmph, not as much as I would have liked.  I suppose Intel is getting pretty
> darned good with its branch prediction.  It shaved about 3 minutes off
> 183.equake from what I posted earlier this week; that's something around a 7%
> improvement, assuming it's not just all noise (I havn't run that test enough
> times to see what the variation is).

If 3 minutes(!!) is only 7% then this test is a monster.

> 
> > +    case TCG_COND_NE:
> > +        if (const_arg2) {
> > +            if ((uint16_t) arg2 == arg2) {
> > +                tcg_out32 (s, XORI | RS (arg1) | RA (0) | arg2);
> > +            }
> > +            else {
> > +                tcg_out_movi (s, TCG_TYPE_I32, 0, arg2);
> > +                tcg_out32 (s, XOR | SAB (arg1, 0, 0));
> > +            }
> > +        }
> > +        else {
> > +            tcg_out32 (s, XOR | SAB (arg1, 0, arg2));
> > +        }
> > +
> > +        tcg_out32 (s, ADDIC | RT (arg0) | RA (0) | 0xffff);
> > +        tcg_out32 (s, SUBFE | TAB (arg0, arg0, 0));
> > +        return;
> 
> Heh, you know a trick that gcc doesn't for powerpc.  It just adds an xor at
> the end of the EQ sequence.

Well, truth be told, i just looked at what gcc 4.4.1 produces for:

    return op1 != op2 ? 1 : 0;
               ==

And did the same. FWIW gcc's handling of LT,LE,GT,GE is not as naive as
this implementation (it avoid CR ops/moves when op2 is immediate), but
i'm not sure if the gain is worth the pain though, so left it is for
simplicity's sake.

P.S. BTW PPC has the same dilema w.r.t. conditional moves, on x86 it's
     cmov that is not universally available for PPC it's isel, given
     that there's a also some fruit in going with LDBRX were available
     i guess it's worthwile investigation the proper interface for
     enquiring the host CPU features...

-- 
mailto:address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]