qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] best way to implement emulation of AArch64 tagged addre


From: Peter Maydell
Subject: Re: [Qemu-devel] best way to implement emulation of AArch64 tagged addresses
Date: Fri, 8 Apr 2016 19:06:17 +0100

On 8 April 2016 at 18:20, Tom Hanson <address@hidden> wrote:
> On Mon, 2016-04-04 at 10:56 -0700, Richard Henderson wrote:
>> On 04/04/2016 09:31 AM, Peter Maydell wrote:
>> > On 4 April 2016 at 17:28, Richard Henderson <address@hidden> wrote:
>> >> On 04/04/2016 08:51 AM, Peter Maydell wrote:
>> >>> In particular I think if you just do the relevant handling of the tag
>> >>> bits in target-arm's get_phys_addr() and its subroutines then this
>> >>> should work ok, with the exceptions that:
>> >>>    * the QEMU TLB code will think that [tag A + address X] and
>> >>>      [tag B + address X] are different virtual addresses and they will
>> >>>      miss each other in the TLB
>> >>
>> >>
>> >> Yep.  Not only miss, but actively contend with each other.
>> >
>> > Yes. Can we avoid that, or do we just have to live with it? I guess
>> > if the TCG fast path is doing a compare on full insn+tag then we
>> > pretty much have to live with it.
>>
>> We have to live with it.  Implementing a more complex hashing algorithm in 
>> the
>> fast path is probably a non-starter.
>>
>> Hopefully if one is using multiple tags, they'll still be in the victim cache
>> and so you won't have to fall back to the full tlb lookup.

> It seems like the "best" solution would be to mask the tag in the TLB
> and it feels like it should be possible.  BUT I need to dig into the
> code more.
>
> Is it an option to mask off the tag bits in all cases? Is there any case
> it which those bits are valid address bits?

The problem, as Richard says, is that our fast path for guest
loads/stores is a bit of inline assembly that basically fishes
the right entry out of the TLB and compares it against the
input address (ie whatever the guest address to the load is
including the tag). A comparison match means we take the fast
path and do an inline access to the backing guest RAM. A mismatch
means we take the slow path (for TLB misses, IO devices, and
various other cases). Since the guest address that the fast
path sees includes the tag bits, if the TLB entry doesn't
include the tag bits then we'd need to do an extra mask operation
in the fast path, which is (a) not good for performance and
(b) would require modifying nine different TCG backends.

For a rarely used feature this is much too much effort (and
it slows down all the code that doesn't use tags for an
uncertain benefit to the code that does use them).

(If you're curious about the inline assembly, it's generated
by functions like tlb_out_tlb_load() in
tcg/i386/tcg-target.inc.c for the x86 backend; similarly for
the various other backends.)

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]