[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] Optimiser bloats code
From: |
David Brown |
Subject: |
Re: [avr-gcc-list] Optimiser bloats code |
Date: |
Thu, 02 Aug 2007 10:49:32 +0200 |
User-agent: |
Thunderbird 2.0.0.5 (Windows/20070716) |
Wouter van Gulik wrote:
Hello list,
After some testing on bit inversion etc if thought I found the most optimal
code.
But the optimiser came in to play and start doing things I don't understand.
I attached the files. I use WinAVR-20070525 and default MFile makefile
(except I build for a mega644)
Most shocking is the example in the bottom. The optimiser decides to init
with PINB&(1<<5) instead of 0.
Ok, but it makes a mess of getting the that bit on bit 0...
<snip>
The answers so far have concentrated on the unnecessary clearing of R25
when returning an uint8_t - Jörg has already given the reason for it,
and said why it would be difficult to change.
The other issue here is the poor code for the first bit in Wouter's
function. I seem to remember a similar thread a while back, with the
explanation being that avr-gcc's cost information for some operations
was inaccurate, leading it to think it is generating good code. The
issue can be seen with a similar but smaller function like this:
uint8_t bittest2(uint8_t a) {
uint8_t res = 0;
if (a & 0x20) res |= 1;
return res;
}
It seems that gcc is attempting to eliminate the branch and to merge the
constants to transform the function into "return (a >> 5) & 0x01;". On
larger cpus, where branches can be hugely expensive and shifts are
single-cycle, this makes sense - on the avr, it ends up with the code:
212 bittest2:
213 /* prologue: frame size=0 */
214 /* prologue end (size=0) */
215 009e 9927 clr r25 ; a
216 00a0 65E0 ldi r22,5 ; ,
217 00a2 9695 1: lsr r25 ; a
218 00a4 8795 ror r24 ; a
219 00a6 6A95 dec r22 ;
220 00a8 01F4 brne 1b
221 00aa 8170 andi r24,lo8(1) ; <result>,
222 00ac 9070 andi r25,hi8(1) ; <result>,
223 /* epilogue: frame size=0 */
224 00ae 0895 ret
225 /* epilogue end (size=1) */
226 /* function uint8_t bittest2(uint8_t) size 9 (8) */
(I'm using gcc 4.1.2 with -Os here)
Re-writing to invert the logic can make a sizeable difference:
uint8_t bittest5(uint8_t a) {
uint8_t res = 1;
if (!(a & 0x20)) res &= ~1;
return res;
}
263 bittest5:
264 /* prologue: frame size=0 */
265 /* prologue end (size=0) */
266 00ce 8295 swap r24 ; tmp50
267 00d0 8695 lsr r24 ; tmp50
268 00d2 8770 andi r24,0x7 ; tmp50
269 00d4 8170 andi r24,lo8(1) ; tmp50,
270 00d6 9927 clr r25 ; <result>
271 /* epilogue: frame size=0 */
272 00d8 0895 ret
273 /* epilogue end (size=1) */
274 /* function uint8_t bittest5(uint8_t) size 6 (5) */
What seems to have made the difference (I'm guessing here - I don't know
the internals of gcc nearly as well as I should) is that in the second
case, the compiler has managed to avoid promoting "a" to an int before
doing the shift - this results in pretty optimal code.
There is also the issue of the two "andi r24" instructions - perhaps
that is a missed peephole optimisation?
mvh.,
David