Re: [avr-gcc-list] Missed optimisations

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Missed optimisations

From:	Wouter van Gulik
Subject:	Re: [avr-gcc-list] Missed optimisations
Date:	Mon, 14 Jan 2008 16:33:58 +0100
User-agent:	Thunderbird 2.0.0.9 (Windows/20071031)

David Brown schreef:

The code is basically good (the "swap" instruction is used for theshifts, which is very nice - a big improvement over the older 3.4 gcc),but there are a few missed optimisations shown here that are probablyquite common in other code.
Why is the address of crcTable8n loaded into r18:r19 first, before beingcopied into r30:r31 for the address calculation? It seems that thishappens when the address is reused - if it is not reused, then r30:r31are loaded directly. However, the reuse does not benefit from havingthe address in a register - the "add r30, r18" and "adc r31, r19" onlines 68 and 69 could be replaced with subi and sbci instructions tosave space and time, and to free registers r18:r19. On most RISC cpus,storing the address in a register for reuse would be a benefit, which isprobably why this code is generated - on the AVR, it is not helpful (atleast, not here).

I don't know. But it happens more often that register are not re-usedwhen the could have been.Maybe because lpm is an a macro. Try replacing it with a normal tableindex. If that helps, write the "ld r??, Z" in an assembler macro to besure.

Secondly, the "(data & 0x0f)" clause generates messy 16-bit code. Irealise C requires integer promotion in such cases, but it's importantto try to remove unnecessary code such as loading the high register withzero, then anding it with zero, then eoring it. gcc version 3.4.6 wassometimes marginally better at such code. It should be noted that thequality of the generated code depends very much on the exact expression- the original "[(crc >> 4) ^ (data & 0x0f)]" generates poor code, whilethe equivalent "[((crc >> 4) ^ data) & 0x0f]" generates tight code.


Hmm, yes it really gets messy on r31/r23:

  62 0020 F0E0              ldi r31,lo8(0)  ; load with 0
  63 0022 70E0              ldi r23,lo8(0)  ; load with 0
  64 0024 6F70              andi r22,lo8(15);
  65 0026 7070              andi r23,hi8(15); re-load R23 with 0
  66 0028 E627              eor r30,r22     ;
  67 002a F727              eor r31,r23     ; zero XOR zero == 0
  68 002c E20F              add r30,r18     ;
  69 002e F31F              adc r31,r19     ;

This is a known "feature". The patches Andrew Hutchinson is working (?)on are supposed to improve this.

I'am wondering why the load of r31 and r23 is done before theoperations. It seems like gcc 4.2.x is moving the loading of thevariables a little more away from the use of them, but this does notbenefit the AVR.


HTH,

Wouter

[Prev in Thread]

Current Thread

[Next in Thread]

[avr-gcc-list] Missed optimisations, David Brown, 2008/01/14
- Re: [avr-gcc-list] Missed optimisations, Wouter van Gulik <=
- RE: [avr-gcc-list] Missed optimisations, Weddington, Eric, 2008/01/14

Prev by Date: [avr-gcc-list] malloc heap
Next by Date: RE: [avr-gcc-list] Missed optimisations
Previous by thread: [avr-gcc-list] Missed optimisations
Next by thread: RE: [avr-gcc-list] Missed optimisations
Index(es):
- Date
- Thread