qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage


From: ltaylorsimpson
Subject: RE: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage
Date: Tue, 3 Dec 2024 17:37:37 -0700


> -----Original Message-----
> From: Brian Cain <brian.cain@oss.qualcomm.com>
> Sent: Tuesday, December 3, 2024 1:28 PM
> To: Anton Johansson <anjo@rev.ng>; Richard Henderson
> <richard.henderson@linaro.org>
> Cc: qemu-devel@nongnu.org; ale@rev.ng; ltaylorsimpson@gmail.com;
> bcain@quicinc.com; philmd@linaro.org; alex.bennee@linaro.org
> Subject: Re: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector
> storage
> 
> 
> On 12/3/2024 12:56 PM, Anton Johansson via wrote:
> > On 22/11/24, Richard Henderson wrote:
> >> On 11/20/24 19:49, Anton Johansson wrote:
> >>> Temporary vectors in helper-to-tcg generated code are allocated from
> >>> an array of bytes in CPUArchState, specified with --temp-vector-block.
> >>>
> >>> This commits adds such a block of memory to CPUArchState, if
> >>> CONFIG_HELPER_TO_TCG is set.
> >>>
> >>> Signed-off-by: Anton Johansson <anjo@rev.ng>
> >>> ---
> >>>    target/hexagon/cpu.h | 4 ++++
> >>>    1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index
> >>> 7be4b5769e..fa6ac83e01 100644
> >>> --- a/target/hexagon/cpu.h
> >>> +++ b/target/hexagon/cpu.h
> >>> @@ -97,6 +97,10 @@ typedef struct CPUArchState {
> >>>        MMVector future_VRegs[VECTOR_TEMPS_MAX]
> QEMU_ALIGNED(16);
> >>>        MMVector tmp_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
> >>> +#ifdef CONFIG_HELPER_TO_TCG
> >>> +    uint8_t tmp_vmem[4096] QEMU_ALIGNED(16); #endif
> >>> +
> >>>        MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> >>>        MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> >> Wow.  Do you really require 4k in temp storage?
> > No, 4k is overkill used during testing.  But consider that Hexagon
> > uses
> > 128- and 256-byte vectors in some cases so if the emitted code uses
> > say
> > 5 temporaries in its computation we end up at 1280 bytes as an upper
> > bound.
> 
> Per-packet there should be a maximum of one temporary.  But per-TB it's
> unbound.  Could we/should we have some guidance to put the brakes on
> translation early if we encounter ~N temp references?
> 
> But maybe that's not needed since the temp space can be reused within a TB
> among packets.

You should only need enough temporaries for one instruction.  There are already 
temporaries (future_VRegs, tmp_VRegs, future_QRegs) in CPUHexagonState to 
handle the needs within a packet.  There shouldn't be any temps needed between 
the packets in a TB.

The number of temps needed for a given instruction is determined by the 
compiler - version, level of optimization.  So, you can determine this by 
compiling all the instructions (i.e., build qemu).  I'd recommend having a few 
extra to future proof against changes to LLVM.

Taylor





reply via email to

[Prev in Thread] Current Thread [Next in Thread]