[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-arm] [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE P
From: |
Richard Henderson |
Subject: |
Re: [Qemu-arm] [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group |
Date: |
Fri, 23 Feb 2018 11:59:14 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 |
On 02/23/2018 07:15 AM, Peter Maydell wrote:
>> +static const uint64_t expand_bit_data[5][2] = {
>> + { 0x1111111111111111ull, 0x2222222222222222ull },
>> + { 0x0303030303030303ull, 0x0c0c0c0c0c0c0c0cull },
>> + { 0x000f000f000f000full, 0x00f000f000f000f0ull },
>> + { 0x000000ff000000ffull, 0x0000ff000000ff00ull },
>> + { 0x000000000000ffffull, 0x00000000ffff0000ull }
>> +};
>> +
>> +/* Expand units of 2**N bits to units of 2**(N+1) bits,
>> + with the higher bits zero. */
>
> In bitops.h we call this operation "half shuffle" (where
> it is specifically working on units of 1 bit size), and
> the inverse "half unshuffle". Worth mentioning that (or
> using similar terminology) ?
I hadn't noticed this helper. I'll at least mention.
FWIW, the half_un/shuffle operation is what you get with N=0, which corresponds
to a byte predicate interleave. We need the intermediate steps for half,
single, and double predicate interleaves.
>> +static uint64_t expand_bits(uint64_t x, int n)
>> +{
>> + int i, sh;
>
> Worth asserting that n is within the range we expect it to be ?
> (what range is that? 0 to 4?)
N goes from 0-3; I goes from 0-4. N will have been controlled by decode, so
I'm not sure it's worth an assert. Even if I did add one, I wouldn't want it
here, at the center of a loop kernel.
>> + d[0] = nn + (mm << (1 << esz));
>
> Is this actually doing an addition, or is it just an odd
> way of writing a bitwise OR when neither of the two
> inputs have 1 in the same bit position?
It could be an OR. Here I'm hoping that the compiler will use a shift-add
instruction. Which it wouldn't necessarily be able to prove by itself if I did
write it with an OR.
>> + d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);
>
> This looks like it's using addition for logical OR again ?
Yes. Although this time I admit it'll never produce an LEA.
>> + /* For VL which is not a power of 2, the results from M do not
>> + align nicely with the uint64_t for D. Put the aligned results
>> + from M into TMP_M and then copy it into place afterward. */
>
> How much risu testing did you do of funny vector lengths ?
As much as I can with the unlicensed Foundation Platform: all lengths from 1-4.
Which, unfortunately does leave a few multi-word predicate paths untested, but
many of the routines loop identically within this length and beyond.
>> +static const uint64_t even_bit_esz_masks[4] = {
>> + 0x5555555555555555ull,
>> + 0x3333333333333333ull,
>> + 0x0f0f0f0f0f0f0f0full,
>> + 0x00ff00ff00ff00ffull
>> +};
>
> Comment describing the purpose of these numbers would be useful.
Ack.
r~
- [Qemu-arm] [PATCH v2 23/67] target/arm: Implement SVE Element Count Group, (continued)
- [Qemu-arm] [PATCH v2 23/67] target/arm: Implement SVE Element Count Group, Richard Henderson, 2018/02/17
- [Qemu-arm] [PATCH v2 26/67] target/arm: Implement SVE Permute - Extract Group, Richard Henderson, 2018/02/17
- [Qemu-arm] [PATCH v2 25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group, Richard Henderson, 2018/02/17
- [Qemu-arm] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group, Richard Henderson, 2018/02/17
- [Qemu-arm] [PATCH v2 27/67] target/arm: Implement SVE Permute - Unpredicated Group, Richard Henderson, 2018/02/17
- [Qemu-arm] [PATCH v2 30/67] target/arm: Implement SVE compress active elements, Richard Henderson, 2018/02/17
- [Qemu-arm] [PATCH v2 29/67] target/arm: Implement SVE Permute - Interleaving Group, Richard Henderson, 2018/02/17
- [Qemu-arm] [PATCH v2 32/67] target/arm: Implement SVE copy to vector (predicated), Richard Henderson, 2018/02/17
- [Qemu-arm] [PATCH v2 31/67] target/arm: Implement SVE conditionally broadcast/extract element, Richard Henderson, 2018/02/17