[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] Missed optimisations
From: |
Wouter van Gulik |
Subject: |
Re: [avr-gcc-list] Missed optimisations |
Date: |
Mon, 14 Jan 2008 16:33:58 +0100 |
User-agent: |
Thunderbird 2.0.0.9 (Windows/20071031) |
David Brown schreef:
The code is basically good (the "swap" instruction is used for the
shifts, which is very nice - a big improvement over the older 3.4 gcc),
but there are a few missed optimisations shown here that are probably
quite common in other code.
Why is the address of crcTable8n loaded into r18:r19 first, before being
copied into r30:r31 for the address calculation? It seems that this
happens when the address is reused - if it is not reused, then r30:r31
are loaded directly. However, the reuse does not benefit from having
the address in a register - the "add r30, r18" and "adc r31, r19" on
lines 68 and 69 could be replaced with subi and sbci instructions to
save space and time, and to free registers r18:r19. On most RISC cpus,
storing the address in a register for reuse would be a benefit, which is
probably why this code is generated - on the AVR, it is not helpful (at
least, not here).
I don't know. But it happens more often that register are not re-used
when the could have been.
Maybe because lpm is an a macro. Try replacing it with a normal table
index. If that helps, write the "ld r??, Z" in an assembler macro to be
sure.
Secondly, the "(data & 0x0f)" clause generates messy 16-bit code. I
realise C requires integer promotion in such cases, but it's important
to try to remove unnecessary code such as loading the high register with
zero, then anding it with zero, then eoring it. gcc version 3.4.6 was
sometimes marginally better at such code. It should be noted that the
quality of the generated code depends very much on the exact expression
- the original "[(crc >> 4) ^ (data & 0x0f)]" generates poor code, while
the equivalent "[((crc >> 4) ^ data) & 0x0f]" generates tight code.
Hmm, yes it really gets messy on r31/r23:
62 0020 F0E0 ldi r31,lo8(0) ; load with 0
63 0022 70E0 ldi r23,lo8(0) ; load with 0
64 0024 6F70 andi r22,lo8(15);
65 0026 7070 andi r23,hi8(15); re-load R23 with 0
66 0028 E627 eor r30,r22 ;
67 002a F727 eor r31,r23 ; zero XOR zero == 0
68 002c E20F add r30,r18 ;
69 002e F31F adc r31,r19 ;
This is a known "feature". The patches Andrew Hutchinson is working (?)
on are supposed to improve this.
I'am wondering why the load of r31 and r23 is done before the
operations. It seems like gcc 4.2.x is moving the loading of the
variables a little more away from the use of them, but this does not
benefit the AVR.
HTH,
Wouter