qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE I


From: Richard Henderson
Subject: Re: [Qemu-arm] [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE Integer Compare - Vectors Group
Date: Fri, 23 Feb 2018 12:57:04 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 02/23/2018 08:29 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <address@hidden> wrote:
>> Signed-off-by: Richard Henderson <address@hidden>
>> ---
> 
>> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
>> index 86cd792cdf..ae433861f8 100644
>> --- a/target/arm/sve_helper.c
>> +++ b/target/arm/sve_helper.c
>> @@ -46,14 +46,14 @@
>>   *
>>   * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
>>   * and bit 0 set if C is set.
>> - *
>> - * This is an iterative function, called for each Pd and Pg word
>> - * moving forward.
>>   */
>>
>>  /* For no G bits set, NZCV = C.  */
>>  #define PREDTEST_INIT  1
>>
>> +/* This is an iterative function, called for each Pd and Pg word
>> + * moving forward.
>> + */
> 
> Why move this comment?

Meant to fold this to the first.  But moving so that I can separately 
document...

>> +/* This is an iterative function, called for each Pd and Pg word
>> + * moving backward.
>> + */
>> +static uint32_t iter_predtest_bwd(uint64_t d, uint64_t g, uint32_t flags)

... this.

>> +    do {                                                                    
>>  \
>> +        uint64_t out = 0, pg;                                               
>>  \
>> +        do {                                                                
>>  \
>> +            i -= sizeof(TYPE), out <<= sizeof(TYPE);                        
>>  \
>> +            TYPE nn = *(TYPE *)(vn + H(i));                                 
>>  \
>> +            TYPE mm = *(TYPE *)(vm + H(i));                                 
>>  \
>> +            out |= nn OP mm;                                                
>>  \
>> +        } while (i & 63);                                                   
>>  \
>> +        pg = *(uint64_t *)(vg + (i >> 3)) & MASK;                           
>>  \
>> +        out &= pg;                                                          
>>  \
>> +        *(uint64_t *)(vd + (i >> 3)) = out;                                 
>>  \
>> +        flags = iter_predtest_bwd(out, pg, flags);                          
>>  \
>> +    } while (i > 0);                                                        
>>  \
>> +    return flags;                                                           
>>  \
>> +}
> 
> Why do we iterate backwards through the vector? As far as I can
> see the pseudocode iterates forwards, and I don't think it
> makes a difference to the result which way we go.

You're right, it does not make a difference to the result which way we iterate.

Of the several different ways I've written loops over predicates, this is my
favorite.  It has several points in its favor:

  1) Operate on full uint64_t predicate units instead
     of uint8_t or uint16_t sub-units.  This means

     1a) No big-endian adjustment required,
     1b) Fewer memory loads.

  2) No separate loop tail; it is shared with the main loop body.

  3) A sub-point specific to predicate output, but the main loop
     gets to run un-predicated.  Here the governing predicate is
     applied at the end: out &= pg.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]