qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [rfc] rewrite tcg-ia64 for auto-bundling


From: Aurelien Jarno
Subject: [Qemu-devel] Re: [rfc] rewrite tcg-ia64 for auto-bundling
Date: Mon, 10 Jan 2011 01:45:09 +0100
User-agent: Mutt/1.5.18 (2008-05-17)

On Sun, Jan 09, 2011 at 03:24:44PM -0800, Richard Henderson wrote:
> Something I noticed while working on that deposit patch for ia64
> was how annoying it is to bundle instructions by hand.  And the
> fact that it's been done incorrectly at least once.  E.g.
> 
> static inline void tcg_out_bswap64(TCGContext *s, TCGArg ret, TCGArg arg)
> {
>     tcg_out_bundle(s, mII,
>                    tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0),
>                    tcg_opc_i18(TCG_REG_P0, OPC_NOP_I18, 0),
>                    tcg_opc_i3 (TCG_REG_P0, OPC_MUX1_I3, ret, arg, 0xb));
> 
> Notice that there's an unnecessary stop bit in there.

Agreed this is prone to errors.

> This patch has does too much all at once, I'll admit that.  But
> before I bother going back to split it into smaller pieces, I
> wanted to get some feedback -- including whether it would be
> considered at all.
> 
> Some statistics that I gleaned from grepping -d out_asm.  Both before
> and after the rewrite, the only two linux-user targets that work are
> i386 and alpha.  Clearly there are still bugs to fix, but it's not
> necessarily a regression.

The problem with linux-user is that ia64 has 16kiB pages, and we don't
really support this when emulating 4kiB guests.

>               Code size                       Stop bits
>                   old     new change             old     new  change
> i386/ls               2309712 1602736 -31%            215520   98698  -54%
> alpha/ls      1088352  817888 -25%            106215   48440  -54%
> 
> On the system side, I am able to boot the arm-test kernel.  But
> it's a bit harder to get repeatable statistics there.  I assume
> that's all timing issues, and what code gets run when, and how
> that affects the TB cache.

When writing the IA64 backend, I also started trying to bundle the
instructions automatically, but stopped because it was getting nowhere.
It's an ambitious project but if you can get it working, I am sure it'll
really improve QEMU on IA64.

The problems that need to be taken into account when automatically
bundling the code are mostly due to the partial code retranslation
(look at tcg_gen_code_search_pc()):
- retranslating only a part of the ops (always starting at the beginning
  of a TB, but not up to the end) should not change the already 
  generated code, even temporarily.
- it should be possible to identify the op which called an helper or
  the softmmu code with __builtin_return_address(), and thus identify
  which part of the bundle. s->code_ptr should therefore be incremented
  the same way that __builtin_return_address() is.

Other than that, it's always a trade-off between complexity of the code
generator and the compactness of the generated code. It doesn't worth
generated a 20% more optimized code if it takes 10 times more time to
generate it. The problem with that is that you can't really know before
trying, at least partially.

I have tried to measure the speed improvement on the i386 target, but it
triggers a tcg_abort() while booting the kernel, so it's not possible to
conclude on the speed side yet.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]