[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] Shorter code?
From: |
hutchinsonandy |
Subject: |
Re: [avr-gcc-list] Shorter code? |
Date: |
Thu, 12 Jun 2008 16:41:42 -0400 |
Im aware of this and similar issues and will be working to improve this.
Here is what compiler sees as individual instructions:
768: 80 91 1d 01 lds r24, 0x011D MOVE BYTE (volatile maybe)
76c: 20 91 1c 01 lds r18, 0x011C MOVE BYTE
770: 99 27 eor r25, r25 ZERO EXTEND R24 to make unsigned int.
772: 98 2f mov r25, r24 SHIFT LEFT 8 bits
774: 88 27 eor r24, r24
776: 33 27 eor r19, r19 ZERO EXTEND R18 to make unsigned int
778: 82 2b or r24, r18 OR unsigned int
77a: 93 2b or r25, r19
77c: 08 95 ret
Though you could restructure your code to avoid some of this, the key
problems are:
Left shift is seen by compiler as one instruction. But in reality it is
two that can be accormplished with two BYTE MOVE (called MOVQI).
ZERO EXTEND is ok (compiler can split this into MOVQI and I assume it
does)
OR is performed as 16 bit - when it could be split into two BYTE OR's.
Taking the last part, the compiler can't propagate the zero R19 into
the OR - since the OR is 16 bits. If this pattern were split
then that part would collapes:
or r24,r18
ret
Similarly, if the SHIFT was split, the rest should collapse.
There is limited splitting that on gcc 4.3/4.4 (4.4 is better). This is
controlled by fno-split-wide-types (or something like that)
Often it is turned off since spliting just part of instructions leaves
a mixed bag of BYTE/WORD/LONG instructions that prevent some
optimisations.
The example you give is entirely within scope of the work I am doing,
and hopefully will reach 4.4.
There are some situation where it is much more difficult. This includes
any arithemetic operations that involve CARRY. With the exception of
shifts by 8 bits, such instructions cannot be split. This includes ADD,
SUB, COMPARE and SHIFT other than by multiple of 8 bits
----------------------------------------------
Sent from my Dingleberry wired device.
-----Original Message-----
From: Ruud Vlaming <address@hidden>
To: address@hidden
Sent: Thu, 12 Jun 2008 9:27 am
Subject: [avr-gcc-list] Shorter code?
Hi
Normally gcc generates well optimized code, but
sometimes i wunder how gcc can do simple things
so complicated.
Here is an example,
uint16_t genGetTickCount(void)
{ return (((uint16_t) uxTickCount.HighByte) << 8) | (uint16_t)
(uxTickCount.LowByte) ; }
generates
00000768 <genGetTickCount>:
768: 80 91 1d 01 lds r24, 0x011D
76c: 20 91 1c 01 lds r18, 0x011C
770: 99 27 eor r25, r25
772: 98 2f mov r25, r24
774: 88 27 eor r24, r24
776: 33 27 eor r19, r19
778: 82 2b or r24, r18
77a: 93 2b or r25, r19
77c: 08 95 ret
whereas it could have been 12 bytes (!) shorter:
00000768 <genGetTickCount>:
768: 80 91 1d 01 lds r25, 0x011D
76c: 20 91 1c 01 lds r24, 0x011C
770: 08 95 ret
Is there a way to write the methode defined above in C to make the
generate this assembly? Some special combine function maybe?
Further, i dont know how much intelligence you may expect from the
compiler, but for example, first cleaning r25, and directly afterwards
filling it with r24 seems really a waste of effort. By direct
inspection,
thus _without_ any knowledge what is going on, this code could be
reduced in the following simple steps (ignore line numbers):
00000768 <genGetTickCount>:
768: 80 91 1d 01 lds r24, 0x011D
76c: 20 91 1c 01 lds r18, 0x011C
770: 99 27 eor r25, r25 //remove this, since it directly
overwritten
afterwards
772: 98 2f mov r25, r24
774: 88 27 eor r24, r24
776: 33 27 eor r19, r19
778: 82 2b or r24, r18
77a: 93 2b or r25, r19 //remove this since "or" with zero
does not
change the value of r25
77c: 08 95 ret
00000768 <genGetTickCount>:
768: 80 91 1d 01 lds r24, 0x011D
76c: 20 91 1c 01 lds r18, 0x011C
772: 98 2f mov r25, r24
774: 88 27 eor r24, r24
776: 33 27 eor r19, r19 //remove this, the register is unused.
778: 82 2b or r24, r18 // change ito "mov" since r24 is zero
77c: 08 95 ret
00000768 <genGetTickCount>:
768: 80 91 1d 01 lds r24, 0x011D //directly fill this with r25,
since the
value r24 is destroyed after the move
76c: 20 91 1c 01 lds r18, 0x011C
772: 98 2f mov r25, r24 //remove this since r25 will be
filled
directly
774: 88 27 eor r24, r24 //remove this, since it directly
overwritten
afterwards
778: 82 2b mov r24, r18
77c: 08 95 ret
00000768 <genGetTickCount>:
768: 80 91 1d 01 lds r25, 0x011D
76c: 20 91 1c 01 lds r18, 0x011C //direcly fill this with r24 since
r18 is
unsed after the move
778: 82 2b mov r24, r18 //remove this since it r24 will be
filled
directly
77c: 08 95 ret
00000768 <genGetTickCount>:
768: 80 91 1d 01 lds r25, 0x011D
76c: 20 91 1c 01 lds r24, 0x011C
77c: 08 95 ret
Could such post compiler optimization steps be integrated in the
compiler?
Like to hear your comments.
Ruud.
_______________________________________________
AVR-GCC-list mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list