[avr-gcc-list] Missing compiler optimizations?

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Missing compiler optimizations?

From:	Erik Walthinsen
Subject:	[avr-gcc-list] Missing compiler optimizations?
Date:	Sat, 21 Jan 2006 22:58:08 -0800
User-agent:	Debian Thunderbird 1.0.7 (X11/20051017)

I'm going through and converting select functions in my program toassembly, sometimes for code size, sometimes for speed. Usually I getabout 2x improvement in one category or the other, non infrequently bothat the same time.

The scary thing is that I'm looking at what GCC is producing (both3.4.4[avr-patched] and 4.0.2[un-patched]) and some of it is missingreally *basic* optimization concepts. It's a little shocking to me that4.0 is generally slightly worse than 3.4, which is not doing very greatto begin with.


For example:

#define PARTPINS 28
uint8_t pinconfig[PARTPINS];
void load_eeprom_defaults() {
  uint8_t i;
  uint8_t *ee = 0, *ram;

  ram = (uint8_t *)pinconfig;
  for (i=0;i<PARTPINS;i++) {
    pinconfig[i] = bootloader_eeprom_read_byte(ee++);
  }
}

With gcc 3.4.4 I get:

void load_eeprom_defaults() {
 678:   ef 92           push    r14
 67a:   ff 92           push    r15
 67c:   1f 93           push    r17
 67e:   cf 93           push    r28
 680:   df 93           push    r29
  uint8_t i;
  uint8_t *ee = 0, *ram;
 682:   c0 e0           ldi     r28, 0x00       ; 0
 684:   d0 e0           ldi     r29, 0x00       ; 0

  /* pinconfig */
  ram = (uint8_t *)pinconfig;
 686:   8f e3           ldi     r24, 0x3F       ; 63
 688:   e8 2e           mov     r14, r24
 68a:   81 e0           ldi     r24, 0x01       ; 1
 68c:   f8 2e           mov     r15, r24
 68e:   1b e1           ldi     r17, 0x1B       ; 27
  for (i=0;i<PARTPINS;i++) {
    *(ram++) = bootloader_eeprom_read_byte(ee++);
 690:   ce 01           movw    r24, r28
 692:   21 96           adiw    r28, 0x01       ; 1

694: a6 d9 rcall .-3252 ; 0xfffff9e2<__eeprom_end+0xff7ef9e2>

 696:   f7 01           movw    r30, r14
 698:   81 93           st      Z+, r24
 69a:   7f 01           movw    r14, r30
 69c:   11 50           subi    r17, 0x01       ; 1
 69e:   17 ff           sbrs    r17, 7

6a0: f7 cf rjmp .-18 ; 0x690<load_eeprom_defaults+0x18>

 6a2:   df 91           pop     r29
 6a4:   cf 91           pop     r28
 6a6:   1f 91           pop     r17
 6a8:   ff 90           pop     r15
 6aa:   ef 90           pop     r14
 6ac:   08 95           ret
  }

GCC 4.0.2 is a bittle better, shaving two push/pops but gaining aninstruction somewhere else.


By hand, I get:

.global load_eeprom_defaults_asm
load_eeprom_defaults_asm:
/* Registers:
        r31:r30         ram address
        r27:r26         eeprom address
        r25:r24         eeprom_read_byte address argument
        r24             eeprom read byte
        r23             i (port)
*/
        ldi r26, lo8(0)         /* load zero EEPROM address */
        ldi r27, hi8(0)

/* load the pinconfig array */
        ldi r30, lo8(pinconfig) /* load Z with pinconfig */
        ldi r31, hi8(pinconfig)
        ldi r23, PARTPINS       /* start loop with PARTPINS */
pinconfigLoop:
        movw r24, r26           /* copy EEPROM addr to argument */
        rcall eeprom_read_byte
        st Z+, r24              /* store the result in pinconfig[] */
        adiw r26, 1             /* increment the EEPROM addr */
        dec r23                 /* decrement pin count */
        brne pinconfigLoop      /* ...and loop */

        ret

That's a reduction from 27 instructions to 12, and *zero* stack space.Run time (non-critical in this case, but...) should be ~15 cyclesinstead of ~40.

Right off the bat the compiler is making use of mostly low registers, aswell as Y (r29:r28), all of which have to be saved. Why isn't it usingthe high registers that are caller-saved (r18-r27, r30-r31) instead, andsaving more than a third of the code size and half the cycle count justin the push/pops?

The use of low registers has a number of other major disadvantages.First off, in loading the 'ram' pointer, it has to ldi into highregisters (ldi only works on r16-r31), then mov into the low registers.Another one, not seen in this particular example unless you switchfrom *(ram++) to pinconfig[i] as the lvalue, is the loss of adiw.Because the compiler stores things in low registers, it has to usesubi+subc of 0xffff just to increment the pointer by one. adiw works ifyou use the top 4 pairs of registers, 3 of which are convenientlycaller-saved.

In other cases I've seen all kinds of relatively common idioms gettranslated very poorly. A classic one is constructing a word from twobytes, which in my case happens as TWI interrupts feed them to me one ata time:


  w = b | (c<<8);
  68:   99 27           eor     r25, r25
  6a:   33 27           eor     r19, r19
  6c:   32 2f           mov     r19, r18
  6e:   22 27           eor     r18, r18
  70:   82 2b           or      r24, r18
  72:   93 2b           or      r25, r19

w is 16 bits, b and c are both 8. This should be doable with just twomov's, but instead it takes 6 instructions. I understand that it'spromoting the variables, but it's still inefficient.

I understand that the larger issues of register selection are rathersystemic and thus probably hard to solve, but a lot of these smalleridioms seem like they're just crying out for a simple RTL match that cangenerate efficient code. I know next to nothing about gcc specificallyand compilers in general, but I know gcc has a peephole optimizer, andAFAIU that's what's supposed to solve these cases.

What would it take in order to collect and implement improvements toidioms like this? I run into them fairly regularly, and if someoneknows how to convert these things into something the compiler can use,I'll gladly write them up and document what I find. Heck, if it's easyenough to add them to the compiler I'll do the whole thing.


TIA,
   Omega
   aka Erik Walthinsen
   address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

[avr-gcc-list] Missing compiler optimizations?, Erik Walthinsen <=
- Re: [avr-gcc-list] Missing compiler optimizations?, Joerg Wunsch, 2006/01/22
- Re: [avr-gcc-list] Missing compiler optimizations?, Paulo Marques, 2006/01/23
- RE: [avr-gcc-list] Missing compiler optimizations?, Nigel Winterbottom, 2006/01/23
  - Re: [avr-gcc-list] Missing compiler optimizations?, Erik Walthinsen, 2006/01/23
    - RE: [avr-gcc-list] Missing compiler optimizations?, Nigel Winterbottom, 2006/01/24
    - Re: [avr-gcc-list] Missing compiler optimizations?, Erik Walthinsen, 2006/01/24

Prev by Date: Re: [avr-gcc-list] Hardware-Software Co-Simulation with the AVR in VHDL
Next by Date: [avr-gcc-list] Re: [avr-chat] [ANN] WinAVR 20060119 Released
Previous by thread: [avr-gcc-list] Hardware-Software Co-Simulation with the AVR in VHDL
Next by thread: Re: [avr-gcc-list] Missing compiler optimizations?
Index(es):
- Date
- Thread