qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] TCG: AREG0 removal planning


From: Blue Swirl
Subject: Re: [Qemu-devel] TCG: AREG0 removal planning
Date: Wed, 11 May 2011 20:32:31 +0300

On Wed, May 11, 2011 at 12:58 AM, Richard Henderson <address@hidden> wrote:
> On 05/10/2011 01:54 PM, Blue Swirl wrote:
>> TCG the generator backend
>> -AREG0 is used for qemu_ld/st ops for TLB access. It should be
>> possible for the translators to pass instead a pointer to either
>> CPUState or directly to the TLB.
>
> I believe that AREG0 should continue to be present in the generated
> code.  There are simply too many references to it throughout the
> translated code for allocating this dynamically to be a win.
>
> What should change, however, is the removal of AREG0 outside the
> generated code.  The cpu-state pointer should be passed as a regular
> parameter wherever it is required.  This includes tcg_qemu_tb_exec,
> which means that the generated prologue would change, setting up
> AREG0 in the process.

I have exactly opposite feeling, AREG0 needs to be eliminated from
generated code but converting helpers won't be useful. Maybe some
experiments are needed to see what would be the real gains.

>> New qemu_ld/st ops are needed for all TCG targets.
>
> Yes, qemu_ld/st would have to change to accommodate the new parameter
> being passed.
>
> While we're at it, let us change things a bit further to allow guest
> byte-swap load/store insns to be implemented more efficiently.  For
> instance, currently a sparc load_asr (little-endian), as emulated on
> an x86 host, does the byte swap twice.
>
> There is, currently, a const int parameter to qemu_ld/st that encodes
> the size of the load.  Almost all TCG backends behind the scenes
> extend this parameter with a bit to indicate byte swap needed.  Let us
> formalize this, and allow this to be set in the original TCG op, with
> appropriate new inlines in tcg-op.h to access it from the translators.
>
> We can also make things easier for the backends by allowing them
> to declare that they do or do not have byte swap load/store insns.
> If the such are not available, a separate bswap opcode is emitted
> right from tcg_gen_qemu_st32 et al.
>
> This would allow a nice cleanup for i386, which currently has a small
> register allocation problem in the store path, what with needing to
> not clobber the input register while byte swapping.  (This problem is
> solved by restricting the set of input registers for qemu_ld/st.)
>
> All this does require the slow path to be changed to accommodate this.
> In particular, if byte-swap memory ops are available, we need slow
> path functions that also byte swap.  Indeed, I'd expect them to use
> the byte-swap memory ops themselves.  Further, if byte-swap memory
> ops are not available, the slow path should always return memory in
> the host byte order, because a separate bswap operation will be done
> on behalf of the fast path.

I agree.

>> -TCG temps are stored in CPUState field temp_buf[], accessed via
>> AREG0. Maybe a regular stack frame should be allocated instead?
>
> Probably.  Most of the backends manage a stack frame anyway, to
> handle registers saved in the prologue.  All that would be needed
> is a define from TCG to tell the backends how much memory is required,
> and some value passed from the backends to tell TCG what the offset
> of that area is from the stack pointer.

Currently the size of temp_buf is fixed, but the exact size for the
stack frame could be calculated during translation. Not that there
would be measurable performance improvements though.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]