qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizat


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizations for MFC1
Date: Wed, 15 Jul 2015 09:31:24 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1


On 15/07/2015 00:09, Aurelien Jarno wrote:
>> > 2) 64-bit processors that have loads with 32-bit addresses.
>> > 
>> >    => qemu_ld/qemu_st can use 32-bit addresses to do the
>> >       truncation
>> > 
>> >    aarch64, I think, falls under this group
> I don't think that works. We don't want to get a load with a 32-bit
> address. We want a load of (guest_base + address), with guest_base
> possibly being 64-bit, address being 32-bit and the result likely
> being 64-bit.

aarch64, IIUC, has complicated addressing modes with a 64-bit base and a
32-bit sign- or zero-extended index, which is exactly what you need
here.  However, the backend is not using it, so right now aarch64 is the
same as x86.

> Well the use of ADDR32 is a bit special, it only works because we can't
> use %gs to add the guest base address. When we can't use %gs, ADDR32
> can't work.

Yes.  bsd-user would have to sign extend, in particular.

> I don't think the register allocator is at fault at all. The register
> tcg_reg_alloc_mov doesn't check for the register type because a TCG mov
> is by definition only between registers of the same size.

Ok, I see your point.  If you put it like this :) the fault definitely
lies in the backends.  What I'm proposing would be in a new
tcg_reg_alloc_trunc function, and it would require implementing a
non-noop trunc.

I still believe the register allocator can be improved to do 32-bit
loads, though as an optimization and not as a bugfix:

> > Even if the prefix was added, modifying the register allocator to use
> > 32-bit loads would still be useful as an optimization, since on x86
> > 32-bit loads are smaller than 64-bit loads.
>
> AFAIK, that's already the case. The REXW prefix is only emitted for
> 64-bit ops.

Yes, but a load from a 64-bit register to a 32-bit destination emits
REX.W.  From Leon's dump:

 mov_i32 tmp1,w0.d0  => mov    0xe8(%r14),%rbp
 mov_i32 tmp0,tmp1
 mov_i32 t8,tmp0     => mov    %ebp,0x60(%r14)

Note %rbp as the load destination and %ebp as the source of the store.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]