[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] [avr-libc]: Smarter parity implementation
From: |
Georg-Johann Lay |
Subject: |
Re: [avr-gcc-list] [avr-libc]: Smarter parity implementation |
Date: |
Mon, 01 Aug 2011 15:10:50 +0200 |
User-agent: |
Thunderbird 2.0.0.24 (X11/20100302) |
Paulo Marques wrote:
> Georg-Johann Lay wrote:
>> The current implementation of parity in avr-libc compiles
>>
>> #include <stdint.h>
>> #include <util/parity.h>
>>
>> uint8_t pari1 (uint8_t val)
>> {
>> return parity_even_bit (val);
>> }
>>
>> with -Os to
>>
>> pari1:
>> /* #APP */
>> mov __tmp_reg__,r24
>> swap r24
>> eor r24,__tmp_reg__
>> mov __tmp_reg__,r24
>> lsr r24
>> lsr r24
>> eor r24,__tmp_reg__
>> /* #NOAPP */
>> ldi r25,lo8(0)
>> adiw r24,1
>> asr r25
>> ror r24
>> andi r24,lo8(1)
>> ret
>>
>> which are 12 instructions, 13 ticks and 2 regs.
>>
>> Parity can be implemented with 9 instructions
>> and 9 ticks and one reg. Instead of r24 any d-reg
>> can be used:
>>
>> ;; r24 = parity8 (r24)
>> ;; clobbers: __tmp_reg__
>> parity8:
>> ;; parity is in r24[0..7]
>> mov __tmp_reg__, r24
>> swap __tmp_reg__
>> eor r24, __tmp_reg__
>> ;; parity is in r24[0..3]
>> subi r24, -4
>> andi r24, -5
>> subi r24, -6
>> ;; parity is in r24[0,3]
>> sbrc r24, 3
>> inc r24
>> ;; parity is in r24[0]
>> andi r24, 1
>>
>> An implementation similar to the original avr-libc
>> using one instruction/tick more is
>>
>> mov __tmp_reg__,r24
>> swap r24
>> eor r24,__tmp_reg__
>> mov __tmp_reg__,r24
>> lsr r24
>> lsr r24
>> eor r24,__tmp_reg__
>> lsr r24
>> sbci r24,0
>> andi r24,1
>
> IIRC you're required to clear r25 even if the return type is uint8_t,
> i.e., the return type needs to be upgraded to "int".
>
> I think this happens because, since most expressions are upgraded to int
> anyway, it is cheaper to do it in one place than in all call sites.
The current parity_even_bit is not a function, it's just a macro that
expands as
#define parity_even_bit(val) \
(__extension__({ \
unsigned char __t; \
__asm__ ( \
"mov __tmp_reg__,%0" "\n\t" \
"swap %0" "\n\t" \
"eor %0,__tmp_reg__" "\n\t" \
"mov __tmp_reg__,%0" "\n\t" \
"lsr %0" "\n\t" \
"lsr %0" "\n\t" \
"eor %0,__tmp_reg__" \
: "=r" (__t) \
: "0" ((unsigned char)(val)) \
: "r0" \
); \
(((__t + 1) >> 1) & 1); \
}))
the promotion is triggered by the last line, but there is
no need for a premature promotion; the last line could be
((unsigned char) (__t & 1))
or
__t;
depending on what the asm does.
Doing the AND outside asm might be smart because tests like
if (parity_even_bit (x))
get parts or condition code like Z-flag from AND.
Anyway, even if parity_even_bit was a function
unsigned char parity_even_bit (unsigned car);
there was no need to promote the return value.
The current text of the AVR ABI states that 8-bit return
values are always promoted to int.
However, avr-gcc does not behave that way since 4.3+ and
in earlier version the promotion was just because of an
optimization flaw; there is/was no code in GCC's avr
backend that describes/described such a promotion.
There was a discussion between some avr maintainers some
time ago that addressed that issue. The outcome was to
update the ABI instead of forcing avr-gcc to promote.
Updating the ABI will allow for better code generation,
in particular in avr-libc's assembler parts.
The only case when such an ABI change would trigger problems
is when assembler code calls a C-function returning an 8-bit
value and expects the return value to be promoted.
As avr-gcc actually does not promote since 4.3+ and there
is not a single bug report in that direction (at least non
I am aware of), I think it's very much preferred to adapt
the ABI to avr-gcc's behavior than the other way round.
BTW, avr-gcc 4.7 comes with optimized versions of some buitins,
e.g. parity:
unsigned char my_parity (unsigned char a, unsigned char b)
{
return __builtin_parity (a) + b;
}
which translates to
my_parity:
rcall __parityqi2
add r24,r22
ret
Notice that the compiler knows the register footprint, i.e.
it knows that b (passed in r22) will survive the call.
__parityqi2 uses the algorithm from above so that it might
be desirable to use __builtin_parity when optimizing for size.
Johann