[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 5/6] target-ppc: add lxvb16x and lxvh8x
From: |
Nikunj A Dadhania |
Subject: |
Re: [Qemu-devel] [PATCH 5/6] target-ppc: add lxvb16x and lxvh8x |
Date: |
Wed, 10 Aug 2016 14:51:25 +0530 |
User-agent: |
Notmuch/0.21 (https://notmuchmail.org) Emacs/25.0.94.1 (x86_64-redhat-linux-gnu) |
Richard Henderson <address@hidden> writes:
> On 08/08/2016 10:57 AM, Richard Henderson wrote:
>> On 08/07/2016 11:06 PM, Nikunj A Dadhania wrote:
>>> +#define LXV(name, access, swap, type, elems) \
>>> +uint64_t helper_##name(CPUPPCState *env, \
>>> + target_ulong addr) \
>>> +{ \
>>> + type r[elems] = {0}; \
>>> + int i, index, bound, step; \
>>> + if (msr_le) { \
>>> + index = elems - 1; \
>>> + bound = -1; \
>>> + step = -1; \
>>> + } else { \
>>> + index = 0; \
>>> + bound = elems; \
>>> + step = 1; \
>>> + } \
>>> + \
>>> + for (i = index; i != bound; i += step) { \
>>> + if (needs_byteswap(env)) { \
>>> + r[i] = swap(access(env, addr, GETPC())); \
>>> + } else { \
>>> + r[i] = access(env, addr, GETPC()); \
>>> + } \
>>> + addr = addr_add(env, addr, sizeof(type)); \
>>> + } \
>>> + return *((uint64_t *)r); \
>>> +}
>>
>> This looks more complicated than necessary.
>>
>> (1) In big-endian mode, surely this simplifies to two 64-bit big-endian
>> loads.
>>
>> (2) In little-endian mode, the overhead of accessing memory surely dominates,
>> and therefore we should perform two 64-bit loads and manipulate the data
>> after.
>>
>> AFAICS, this is easiest done by requesting two 64-bit *big-endian* loads, and
>> then swapping bytes. E.g.
>>
>> uint64_t helper_bswap16x4(uint64_t x)
>> {
>> uint64_t m = 0x00ff00ff00ff00ffull;
>> return ((x & m) << 8) | ((x >> 8) & m);
>> }
>>
>> uint64_t helper_bswap32x2(uint64_t x)
>> {
>> return deposit64(bswap32(x >> 32), 32, 32, bswap32(x));
>> }
>
> To correct myself, this big-endian load really only makes sense for lxvh8x.
> For lxvw4x, a little-endian load with a word swap is fewer operations. I.e.
>
> tcg_gen_qemu_ld_i64(t0, addr, ctx->mem_idx, MO_LEQ);
> tcg_gen_shri_i64(t1, t0, 32);
> tcg_gen_deposit_i64(dest, t1, t0, 32, 32);
I have following for lxvw4x:
if (ctx->le_mode) {
t0 = tcg_temp_new_i64();
t1 = tcg_temp_new_i64();
tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
tcg_gen_shri_i64(t1, t0, 32);
tcg_gen_deposit_i64(xth, t1, t0, 32, 32);
tcg_gen_addi_tl(EA, EA, 8);
tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
tcg_gen_shri_i64(t1, t0, 32);
tcg_gen_deposit_i64(xtl, t1, t0, 32, 32);
tcg_temp_free_i64(t0);
tcg_temp_free_i64(t1);
} else {
tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
tcg_gen_addi_tl(EA, EA, 8);
tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
}
Here is the test code:
__vector uint32_t vrt32;
uint32_t rb32[4] = {0x00010203, 0x20212223, 0x30313233, 0x40414243};
asm("lxvw4x %x0, 0, %1 \n\t" \
: "=ws"(vrt32) : "r"(&rb32));
printf("VRT32 = "); vec_put_u32(vrt32);
Result On LE:
VRT32 = 40414243 30313233 20212223 00010203
Result On BE:
VRT32 = 03020100 23222120 33323130 43424140
I had have expected the the BE result to be
VRT32 = 00010203 20212223 30313233 40414243
But seems there is something going under the hoods. Am I missing
something here?
I can fix the BE case using following but not sure if that will be
correct !
tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_Q);
gen_helper_bswap32x2(xth, xth);
tcg_gen_addi_tl(EA, EA, 8);
tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_Q);
gen_helper_bswap32x2(xtl, xtl);
uint64_t helper_bswap32x2(uint64_t x)
{
return deposit64((x >> 32), 32, 32, (x));
}
And then I get the expected result:
VRT32 = 00010203 20212223 30313233 40414243
Regards,
Nikunj
- Re: [Qemu-devel] [PATCH 2/6] target-ppc: Implement darn instruction, (continued)
- [Qemu-devel] [PATCH 3/6] target-ppc: add lxsi[bw]zx instruction, Nikunj A Dadhania, 2016/08/07
- [Qemu-devel] [PATCH 5/6] target-ppc: add lxvb16x and lxvh8x, Nikunj A Dadhania, 2016/08/07
- [Qemu-devel] [PATCH 4/6] target-ppc: add stxsi[bh]x instruction, Nikunj A Dadhania, 2016/08/07
- [Qemu-devel] [PATCH 6/6] target-ppc: add stxvb16x and stxvh8x, Nikunj A Dadhania, 2016/08/07
- Re: [Qemu-devel] [PATCH 0/6] POWER9 TCG enablements - part4, David Gibson, 2016/08/08