[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage
From: |
ltaylorsimpson |
Subject: |
RE: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage |
Date: |
Tue, 3 Dec 2024 17:37:37 -0700 |
> -----Original Message-----
> From: Brian Cain <brian.cain@oss.qualcomm.com>
> Sent: Tuesday, December 3, 2024 1:28 PM
> To: Anton Johansson <anjo@rev.ng>; Richard Henderson
> <richard.henderson@linaro.org>
> Cc: qemu-devel@nongnu.org; ale@rev.ng; ltaylorsimpson@gmail.com;
> bcain@quicinc.com; philmd@linaro.org; alex.bennee@linaro.org
> Subject: Re: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector
> storage
>
>
> On 12/3/2024 12:56 PM, Anton Johansson via wrote:
> > On 22/11/24, Richard Henderson wrote:
> >> On 11/20/24 19:49, Anton Johansson wrote:
> >>> Temporary vectors in helper-to-tcg generated code are allocated from
> >>> an array of bytes in CPUArchState, specified with --temp-vector-block.
> >>>
> >>> This commits adds such a block of memory to CPUArchState, if
> >>> CONFIG_HELPER_TO_TCG is set.
> >>>
> >>> Signed-off-by: Anton Johansson <anjo@rev.ng>
> >>> ---
> >>> target/hexagon/cpu.h | 4 ++++
> >>> 1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index
> >>> 7be4b5769e..fa6ac83e01 100644
> >>> --- a/target/hexagon/cpu.h
> >>> +++ b/target/hexagon/cpu.h
> >>> @@ -97,6 +97,10 @@ typedef struct CPUArchState {
> >>> MMVector future_VRegs[VECTOR_TEMPS_MAX]
> QEMU_ALIGNED(16);
> >>> MMVector tmp_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
> >>> +#ifdef CONFIG_HELPER_TO_TCG
> >>> + uint8_t tmp_vmem[4096] QEMU_ALIGNED(16); #endif
> >>> +
> >>> MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> >>> MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> >> Wow. Do you really require 4k in temp storage?
> > No, 4k is overkill used during testing. But consider that Hexagon
> > uses
> > 128- and 256-byte vectors in some cases so if the emitted code uses
> > say
> > 5 temporaries in its computation we end up at 1280 bytes as an upper
> > bound.
>
> Per-packet there should be a maximum of one temporary. But per-TB it's
> unbound. Could we/should we have some guidance to put the brakes on
> translation early if we encounter ~N temp references?
>
> But maybe that's not needed since the temp space can be reused within a TB
> among packets.
You should only need enough temporaries for one instruction. There are already
temporaries (future_VRegs, tmp_VRegs, future_QRegs) in CPUHexagonState to
handle the needs within a packet. There shouldn't be any temps needed between
the packets in a TB.
The number of temps needed for a given instruction is determined by the
compiler - version, level of optimization. So, you can determine this by
compiling all the instructions (i.e., build qemu). I'd recommend having a few
extra to future proof against changes to LLVM.
Taylor