RE: [avr-gcc-list] Optimiser bloats code

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [avr-gcc-list] Optimiser bloats code

From:	Eric Weddington
Subject:	RE: [avr-gcc-list] Optimiser bloats code
Date:	Thu, 02 Aug 2007 07:37:59 -0600


> -----Original Message-----
> From:
> address@hidden
> [mailto:address@hidden
> org] On Behalf Of David Brown
> Sent: Thursday, August 02, 2007 2:50 AM
> To: address@hidden
> Subject: Re: [avr-gcc-list] Optimiser bloats code
>
> Wouter van Gulik wrote:
> > Hello list,
> >
> > After some testing on bit inversion etc if thought I found
> the most optimal
> > code.
> > But the optimiser came in to play and start doing things I
> don't understand.
> >
> > I attached the files. I use WinAVR-20070525 and default
> MFile makefile
> > (except I build for a mega644)
> > Most shocking is the example in the bottom. The optimiser
> decides to init
> > with PINB&(1<<5) instead of 0.
> > Ok, but it makes a mess of getting the that bit on bit 0...
> >
>
> <snip>
>
> The answers so far have concentrated on the unnecessary
> clearing of R25
> when returning an uint8_t - Jörg has already given the reason for it,
> and said why it would be difficult to change.
>
> The other issue here is the poor code for the first bit in Wouter's
> function.  I seem to remember a similar thread a while back, with the
> explanation being that avr-gcc's cost information for some operations
> was inaccurate, leading it to think it is generating good code.  The
> issue can be seen with a similar but smaller function like this:
>
> uint8_t bittest2(uint8_t a) {
>       uint8_t res = 0;
>       if (a & 0x20) res |= 1;
>       return res;
> }
>
> It seems that gcc is attempting to eliminate the branch and
> to merge the
> constants to transform the function into "return (a >> 5) &
> 0x01;".  On
> larger cpus, where branches can be hugely expensive and shifts are
> single-cycle, this makes sense - on the avr, it ends up with the code:
>
>   212                 bittest2:
>   213                 /* prologue: frame size=0 */
>   214                 /* prologue end (size=0) */
>   215 009e 9927               clr r25  ;  a
>   216 00a0 65E0               ldi r22,5        ; ,
>   217 00a2 9695       1:      lsr r25  ;  a
>   218 00a4 8795               ror r24  ;  a
>   219 00a6 6A95               dec r22  ;
>   220 00a8 01F4               brne 1b
>   221 00aa 8170               andi r24,lo8(1)  ;  <result>,
>   222 00ac 9070               andi r25,hi8(1)  ;  <result>,
>   223                 /* epilogue: frame size=0 */
>   224 00ae 0895               ret
>   225                 /* epilogue end (size=1) */
>   226                 /* function uint8_t bittest2(uint8_t)
> size 9 (8) */
>
> (I'm using gcc 4.1.2 with -Os here)
>
> Re-writing to invert the logic can make a sizeable difference:
>
> uint8_t bittest5(uint8_t a) {
>       uint8_t res = 1;
>       if (!(a & 0x20)) res &= ~1;
>       return res;
> }
>
>   263                 bittest5:
>   264                 /* prologue: frame size=0 */
>   265                 /* prologue end (size=0) */
>   266 00ce 8295               swap r24         ;  tmp50
>   267 00d0 8695               lsr r24  ;  tmp50
>   268 00d2 8770               andi r24,0x7     ;  tmp50
>   269 00d4 8170               andi r24,lo8(1)  ;  tmp50,
>   270 00d6 9927               clr r25  ;  <result>
>   271                 /* epilogue: frame size=0 */
>   272 00d8 0895               ret
>   273                 /* epilogue end (size=1) */
>   274                 /* function uint8_t bittest5(uint8_t)
> size 6 (5) */
>
> What seems to have made the difference (I'm guessing here - I
> don't know
> the internals of gcc nearly as well as I should) is that in
> the second
> case, the compiler has managed to avoid promoting "a" to an
> int before
> doing the shift - this results in pretty optimal code.
>
> There is also the issue of the two "andi r24" instructions - perhaps
> that is a missed peephole optimisation?
>

Hi David,

That is very instructive!

If you have a chance, could you please fill out a GCC bug report for this
(inverting the logic makes shorter code)? Please add me to the CC of the bug
report. The issue regarding the two "and" instructions is known and is GCC
bug #11259. See the list of AVR toolchain bugs here:
<http://www.nongnu.org/avr-libc/bugs.html>.

If you (or anyone) ever feel adventurous, we could always use more help on
GCC (or any part of the toolchain).

Med venlig hilssen,
Eric Weddington

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [avr-gcc-list] Optimiser bloats code, Wouter van Gulik, 2007/08/01
- Re: [avr-gcc-list] Optimiser bloats code, Dave Hylands, 2007/08/01
  - Re: [avr-gcc-list] Optimiser bloats code, Joerg Wunsch, 2007/08/01
- Re: [avr-gcc-list] Optimiser bloats code, Paulo Marques, 2007/08/02
  - RE: [avr-gcc-list] Optimiser bloats code, Eric Weddington, 2007/08/02
  - Re: [avr-gcc-list] Optimiser bloats code, Wouter van Gulik, 2007/08/07
- Re: [avr-gcc-list] Optimiser bloats code, David Brown, 2007/08/02
  - RE: [avr-gcc-list] Optimiser bloats code, Eric Weddington <=

Prev by Date: Re: [avr-gcc-list] Optimiser bloats code
Next by Date: Re: [avr-gcc-list] Optimiser bloats code
Previous by thread: Re: [avr-gcc-list] Optimiser bloats code
Next by thread: [avr-gcc-list] STK500 for at89s52
Index(es):
- Date
- Thread