qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 5/5] tcg/arm: improve direct jump


From: Laurent Desnogues
Subject: Re: [Qemu-devel] [PATCH 5/5] tcg/arm: improve direct jump
Date: Wed, 10 Oct 2012 16:43:32 +0200

On Wed, Oct 10, 2012 at 4:28 PM, Aurelien Jarno <address@hidden> wrote:
> On Wed, Oct 10, 2012 at 03:21:48PM +0200, Laurent Desnogues wrote:
>> On Tue, Oct 9, 2012 at 10:30 PM, Aurelien Jarno <address@hidden> wrote:
>> > Use ldr pc, [pc, #-4] kind of branch for direct jump. This removes the
>> > need to flush the icache on TB linking, and allow to remove the limit
>> > on the code generation buffer.
>>
>> I'm not sure I like it.  In general having data in the middle
>> of code will increase I/D cache and I/D TLB pressure.
>
> Agreed. On the other hand, this patch remove the synchronization of
> the instruction cache for TB linking/unlinking.

TB linking/unlinking should happen less often than code execution.

>> > This improves the boot-up speed of a MIPS guest by 11%.
>>
>> Boot speed is very specific.  Did you test some other code?
>> Also what was your host?
>
> I tested it on a Cortex-A8 machine. I have only tested MIPS, but I can
> do more tests, like running the openssl testsuite in the emulated guest.

Yes, please.

[...]
> This doesn't really surprise me. The goal of the patch is to remove the
> limit of 16MB for the generated code. I really doubt you reach such a
> limit in user mode unless you use some complex code.
>
> On the other hand in system mode, this can be already reached once the
> whole guest kernel is translated, so cached code is dropped and has to
> be re-translated regularly. Re-translating guest code is clearly more
> expensive than the increase of I/D cache and I/D TLB pressure.

Ha yes, that's a real problem.  What about having some define
and/or runtime flag to keep both caches sync and your ldr PC
change in QEMU?

> The other way to allow more than 16MB of generated code would be to
> disable direct jump on ARM. It adds one 32-bit constant loading + one
> memory load, but then you don't have the I/D cache and TLB issue.

The performance hit would be even worse :-)


Laurent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]