[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd ins
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction |
Date: |
Wed, 26 Jun 2019 17:37:16 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 |
On 6/19/19 1:03 PM, Stefan Brankovic wrote:
> Optimize altivec instruction vgbbd (Vector Gather Bits by Bytes by Doubleword)
> All ith bits (i in range 1 to 8) of each byte of doubleword element in
> source register are concatenated and placed into ith byte of appropriate
> doubleword element in destination register.
>
> Following solution is done for both doubleword elements of source register
> in parallel, in order to reduce the number of instructions needed(that's why
> arrays are used):
> First, both doubleword elements of source register vB are placed in
> appropriate element of array avr. Bits are gathered in 2x8 iterations(2 for
> loops). In first iteration bit 1 of byte 1, bit 2 of byte 2,... bit 8 of
> byte 8 are in their final spots so avr[i], i={0,1} can be and-ed with
> tcg_mask. For every following iteration, both avr[i] and tcg_mask variables
> have to be shifted right for 7 and 8 places, respectively, in order to get
> bit 1 of byte 2, bit 2 of byte 3.. bit 7 of byte 8 in their final spots so
> shifted avr values(saved in tmp) can be and-ed with new value of tcg_mask...
> After first 8 iteration(first loop), all the first bits are in their final
> places, all second bits but second bit from eight byte are in their places...
> only 1 eight bit from eight byte is in it's place). In second loop we do all
> operations symmetrically, in order to get other half of bits in their final
> spots. Results for first and second doubleword elements are saved in
> result[0] and result[1] respectively. In the end those results are saved in
> appropriate doubleword element of destination register vD.
>
> Signed-off-by: Stefan Brankovic <address@hidden>
> ---
> target/ppc/helper.h | 1 -
> target/ppc/int_helper.c | 276
> ------------------------------------
> target/ppc/translate/vmx-impl.inc.c | 77 +++++++++-
> 3 files changed, 76 insertions(+), 278 deletions(-)
Reviewed-by: Richard Henderson <address@hidden>
r~
- [Qemu-devel] [PATCH 0/8] target/ppc: Optimize emulation of some Altivec instructions, Stefan Brankovic, 2019/06/19
- [Qemu-devel] [PATCH 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions, Stefan Brankovic, 2019/06/19
- [Qemu-devel] [PATCH 5/8] target/ppc: Optimize emulation of vclzd instruction, Stefan Brankovic, 2019/06/19
- [Qemu-devel] [PATCH 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions, Stefan Brankovic, 2019/06/19
- [Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction, Stefan Brankovic, 2019/06/19
- Re: [Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction,
Richard Henderson <=
- [Qemu-devel] [PATCH 2/8] target/ppc: Optimize emulation of vsl and vsr instructions, Stefan Brankovic, 2019/06/19
- [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions, Stefan Brankovic, 2019/06/19
- [Qemu-devel] [PATCH 3/8] target/ppc: Optimize emulation of vpkpx instruction, Stefan Brankovic, 2019/06/19
- [Qemu-devel] [PATCH 6/8] target/ppc: Optimize emulation of vclzw instruction, Stefan Brankovic, 2019/06/19
- Re: [Qemu-devel] [PATCH 0/8] target/ppc: Optimize emulation of some Altivec instructions, no-reply, 2019/06/19