avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Optimiser bloats code


From: David Brown
Subject: Re: [avr-gcc-list] Optimiser bloats code
Date: Thu, 02 Aug 2007 10:49:32 +0200
User-agent: Thunderbird 2.0.0.5 (Windows/20070716)

Wouter van Gulik wrote:
Hello list,

After some testing on bit inversion etc if thought I found the most optimal
code.
But the optimiser came in to play and start doing things I don't understand.

I attached the files. I use WinAVR-20070525 and default MFile makefile
(except I build for a mega644)
Most shocking is the example in the bottom. The optimiser decides to init
with PINB&(1<<5) instead of 0.
Ok, but it makes a mess of getting the that bit on bit 0...


<snip>

The answers so far have concentrated on the unnecessary clearing of R25 when returning an uint8_t - Jörg has already given the reason for it, and said why it would be difficult to change.

The other issue here is the poor code for the first bit in Wouter's function. I seem to remember a similar thread a while back, with the explanation being that avr-gcc's cost information for some operations was inaccurate, leading it to think it is generating good code. The issue can be seen with a similar but smaller function like this:

uint8_t bittest2(uint8_t a) {
        uint8_t res = 0;
        if (a & 0x20) res |= 1;
        return res;
}

It seems that gcc is attempting to eliminate the branch and to merge the constants to transform the function into "return (a >> 5) & 0x01;". On larger cpus, where branches can be hugely expensive and shifts are single-cycle, this makes sense - on the avr, it ends up with the code:

 212                    bittest2:
 213                    /* prologue: frame size=0 */
 214                    /* prologue end (size=0) */
 215 009e 9927                  clr r25  ;  a
 216 00a0 65E0                  ldi r22,5        ; ,
 217 00a2 9695          1:      lsr r25  ;  a
 218 00a4 8795                  ror r24  ;  a
 219 00a6 6A95                  dec r22  ;
 220 00a8 01F4                  brne 1b
 221 00aa 8170                  andi r24,lo8(1)  ;  <result>,
 222 00ac 9070                  andi r25,hi8(1)  ;  <result>,
 223                    /* epilogue: frame size=0 */
 224 00ae 0895                  ret
 225                    /* epilogue end (size=1) */
 226                    /* function uint8_t bittest2(uint8_t) size 9 (8) */

(I'm using gcc 4.1.2 with -Os here)

Re-writing to invert the logic can make a sizeable difference:

uint8_t bittest5(uint8_t a) {
        uint8_t res = 1;
        if (!(a & 0x20)) res &= ~1;
        return res;
}

 263                    bittest5:
 264                    /* prologue: frame size=0 */
 265                    /* prologue end (size=0) */
 266 00ce 8295                  swap r24         ;  tmp50
 267 00d0 8695                  lsr r24  ;  tmp50
 268 00d2 8770                  andi r24,0x7     ;  tmp50
 269 00d4 8170                  andi r24,lo8(1)  ;  tmp50,
 270 00d6 9927                  clr r25  ;  <result>
 271                    /* epilogue: frame size=0 */
 272 00d8 0895                  ret
 273                    /* epilogue end (size=1) */
 274                    /* function uint8_t bittest5(uint8_t) size 6 (5) */

What seems to have made the difference (I'm guessing here - I don't know the internals of gcc nearly as well as I should) is that in the second case, the compiler has managed to avoid promoting "a" to an int before doing the shift - this results in pretty optimal code.

There is also the issue of the two "andi r24" instructions - perhaps that is a missed peephole optimisation?

mvh.,

David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]