|
From: | LIU Zhiwei |
Subject: | Re: [Qemu-devel] [PATCH v2 05/17] RISC-V: add vector extension load and store instructions |
Date: | Wed, 8 Jan 2020 09:32:33 +0800 |
User-agent: | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 |
As discussed in part1, I will check these conditions at translate time.+static bool vector_lmul_check_reg(CPURISCVState *env, uint32_t lmul, + uint32_t reg, bool widen) +{ + int legal = widen ? (lmul * 2) : lmul; + + if ((lmul != 1 && lmul != 2 && lmul != 4 && lmul != 8) || + (lmul == 8 && widen)) { + helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); + return false; + } + + if (reg % legal != 0) { + helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); + return false; + } + return true; +}These exceptions will not do the right thing. You cannot call helper_raise_exception from another helper, or from something called from another helper, as here. You need to use riscv_raise_exception, as you do elsewhere in this patch, with a GETPC() value passed down from the outermost helper. Ideally you would check these conditions at translate time. I've mentioned how to do this in reply to your v1.
There are two questions here.+ } else if (i < vl) { + switch (width) { + case 8: + if (vector_elem_mask(env, vm, width, lmul, i)) { + while (k >= 0) { + read = i * (nf + 1) + k; + env->vfp.vreg[dest + k * lmul].u8[j] = + cpu_ldub_data(env, env->gpr[rs1] + read);You must not modify vreg[x] before you've recognized all possible exceptions, e.g. validating that a subsequent access will not trigger a page fault. Otherwise you will have a partially modified register value when the exception handler is entered.
"instructions that perform more than one host store must probe the entire range to be stored before performing any stores. "I didn't see the validation of page in SVE, for example, sve_st1_r,
entry = tlb_entry(env, mmu_idx, addr); if (unlikely(entry->ADDR_READ != (addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1))))) { oi = make_memop_idx(SHIFT, mmu_idx); res = glue(glue(helper_ret_ld, URETSUFFIX), MMUSUFFIX)(env, addr, oi, retaddr); } else { uintptr_t hostaddr = addr + entry->addend; res = glue(glue(ld, USUFFIX), _p)((uint8_t *)hostaddr); } |
Without a stride, and without a predicate mask, this can be done with at most two calls to probe_access (one per page). This is the simplification that makes splitting the helper into two very helpful. With a stride or with a predicate mask requires either (1) temporary storage for the loads, and copy back to env at the end, or (2) use probe_access for each load, and then perform the actual loads directly into env. FWIW, ARM SVE uses (1), as probe_access is very new.
I will take it. However I didn't have a big-endian host to test the feature.+ k--; + } + env->vfp.vstart++; + } + break; + case 16: + if (vector_elem_mask(env, vm, width, lmul, i)) { + while (k >= 0) { + read = i * (nf + 1) + k; + env->vfp.vreg[dest + k * lmul].u16[j] = + cpu_ldub_data(env, env->gpr[rs1] + read);I don't see anything in these assignments to vreg[x].uN[y] that take the endianness of the host into account. You need to think about how the architecture defines the overlap of elements -- particularly across vlset -- and make adjustments. I can imagine, if you have explicit tests for this, your tests are passing because the architecture defines a little-endian based indexing of the register file, and you have only run tests on a little-endian host, like x86_64. For ARM, we define the representation as a little-endian indexed array of host-endian uint64_t. This means that a big-endian host needs to adjust the address of any element smaller than 64-bit. E.g. #ifdef HOST_WORDS_BIGENDIAN #define H1(x) ((x) ^ 7) #define H2(x) ((x) ^ 3) #define H4(x) ((x) ^ 1) #else #define H1(x) (x) #define H2(x) (x) #define H4(x) (x) #endif env->vfp.vreg[reg + k * lmul].u16[H2(j)]
Thanks again for your informative comments.
+ if (base >= abs_off) { + return base - abs_off; + } + } else { + if ((target_ulong)((target_ulong)offset + base) >= base) { + return (target_ulong)offset + base; + } + }Why all the extra casting here? They are exactly what is implied by C.+ helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); + return 0;(1) This exception call won't work, as above, (2) Where does this condition against wraparound come from? I don't see it in the specification. (3) You certainly cannot detect this after having written a previous element to the register file. [ Skipping lots of functions that are basically the same. ]+void VECTOR_HELPER(vsxe_v)(CPURISCVState *env, uint32_t nf, uint32_t vm, + uint32_t rs1, uint32_t rs2, uint32_t rd)Pass rs1 by value.+ case 8: + if (vector_elem_mask(env, vm, width, lmul, i)) { + while (k >= 0) { + addr = vector_get_index(env, rs1, src2, j, 1, width, k); + cpu_stb_data(env, addr, + env->vfp.vreg[dest + k * lmul].s8[j]);Must probe_access all of the memory before any stores. Unlike loads, you don't have the option of storing into a temporary. Which suggests a common subroutine to perform the probe(s), rather than bother with a temporary for loads.r~
[Prev in Thread] | Current Thread | [Next in Thread] |