qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations


From: Filip Navara
Subject: Re: [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations
Date: Mon, 29 Jun 2009 20:26:35 +0200

On Mon, Jun 29, 2009 at 7:59 PM, Laurent
Desnogues<address@hidden> wrote:
> On Mon, Jun 29, 2009 at 7:50 PM, Filip Navara<address@hidden> wrote:
>>
>> Big thanks goes to Laurent Desnogues who actually had suggested where
>> the bottlenecks are.
>
> IIRC it's Paul who suggested that idea first on IRC months
> ago,

Yeah, this Paul guy keeps coming with good ideas lately. His work
helped me a lot in writing my bachelor thesis and saved me countless
hours. I own him a beverage at very least, but somehow I doubt he will
come to my little country any time soon.

> as I was complaining about the stupidity of the generated
> code :-)

Let's keep complaining, maybe someone will improve it over the time.

With the applied patches the OP statistics now look like this:

mov_i32 1925
movi_i32 1556
add_i32 518
ld_i32 257
exit_tb 247
brcond_i32 225
qemu_ld32u 219
set_label 207
...

Some minor improvements could be done to the usage of TCG temporary
variables in target-arm/translate.c. That's something that could be
done gradually and without any substantial effort. It would probably
increase the speed by about 1 to 5 percents.

Another idea is to group blocks of conditional instructions to avoid
unnecessary jumps. That would help with code like this:

0x00200d28:  cmp        lr, #0  ; 0x0
0x00200d2c:  movle      r0, #1  ; 0x1
0x00200d30:  movle      r1, r5
0x00200d34:  movle      ip, r0
0x00200d38:  ble        0x200d64

I'm not sure how common pattern this is and I didn't do any further
investigation yet.

Lastly, the code generated for softmmu memory loads/stores could
probably be optimized in some cases. It uses hard-coded registers.
It's not optimized for multiple stores to adjacent locations (pushing
multiple registers to stack) and does all the calculations again and
again. This results not only in recomputing numbers we already have
(as long as the stack is still on the same guest page), but also in
huge TBs. I imagine that doesn't help the processor cache too much.
This would probably benefit all targets. In fact I believe the softmmu
code could be moved out of the TCG target-specific code and into the
main code (with the possibility to override it with optimized
version).

Best regards,
Filip Navara




reply via email to

[Prev in Thread] Current Thread [Next in Thread]