[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions
From: |
Björn Haase |
Subject: |
[avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions |
Date: |
Wed, 01 Dec 2004 22:52:26 +0100 |
Hi,
I have been working on improvements of avr-gcc when dealing with
variables of different sizes, e.g. operations combining objects of
type int8_t, uint16_t, int32_t , etc. .
My main objective has been to reduce the amount of registers required by the
different operations.
E.g. I did not like that when compiling expressions like
extern uint32_t t1;
t1 = 3;
the compiler allways generated code such as
ldi r18,lo8(3)
ldi r19,hi8(3)
ldi r20,hlo8(3)
ldi r21,hhi8(3)
sts t1,r18
sts (t1)+1,r19
sts (t1)+2,r20
sts (t1)+3,r21
using lots of unnecessary registers and load instructions.
My present compiler version generates
ldi r25,lo8(3)
sts t1,r25
sts (t1)+1,__zero_reg__
sts (t1)+2,__zero_reg__
sts (t1)+3,__zero_reg__ ; combined instr
instead.
In the above example the improvement is obvious.
For other cases (e.g. concerning sign-extension of int8_t variables),
however, some optimization steps could be both good and bad depending on the
code type.
I, thus, whish to test my compiler with "real" code in order to avoid
that I finally tune the instruction patterns for
artificial test-cases that might not be helpful in practice.
I think this is especially important, since the main benefit of the additional
patterns is that the compiler needs to allocate less registers. This effect can
only be observed in larger functions.
Particularly useful for the test would be "real" code fragments that
are excessively working with variables and constants of different sizes.
E.g. code with rather large functions with plenty of type-casts, etc. .
Yours,
Björn
P.S.
Even more useful would of course be, if there were someone who would be willing
to have a look at my changes in the
machine description in order to check them :-) or give ideas for further
improvement.
P.P.S.: Two further examples of compilation results (all with -Os and -fnew-ra)
----------------------------
extern int32_t source3;
int32_t sample (int8_t source1, uint16_t source2)
{
return ((source1 + source2) + source3);
}
/* modified compiler */
sample:
movw r18,r22
clr __tmp_reg__
sbrc r24,7
com __tmp_reg__
add r18,r24
adc r19,__tmp_reg__ ; combined instr
lds r22,source3
lds r23,(source3)+1
lds r24,(source3)+2
lds r25,(source3)+3
add r22,r18
adc r23,r19
adc r24,__zero_reg__
adc r25,__zero_reg__ ; combined instr
ret
/* Head gcc 4 */
sample:
clr r25
sbrc r24,7
com r25
add r24,r22
adc r25,r23
movw r18,r24
clr r20
clr r21
movw r24,r20
movw r22,r18
lds r18,source3
lds r19,(source3)+1
lds r20,(source3)+2
lds r21,(source3)+3
add r22,r18
adc r23,r19
adc r24,r20
adc r25,r21
ret
-----------------------------------
extern int32_t *target;
void sample2 (int8_t source1)
{
*target = source1 + 3;
}
/* modified compiler */
sample2:
lds r30,target
lds r31,(target)+1
clr r25
sbrc r24,7
com r25
adiw r24,3
clr __tmp_reg__
sbrc r25,7
com __tmp_reg__
st Z,r24
std Z+1,r25
std Z+2,__tmp_reg__
std Z+3,__tmp_reg__ ; combined instr
ret
/* Head gcc 4 */
sample2:
/* prologue: frame size=0 */
/* prologue end (size=0) */
; basic block 0
lds r30,target
lds r31,(target)+1
clr r25
sbrc r24,7
com r25
adiw r24,3
clr r26
sbrc r25,7
com r26
mov r27,r26
st Z,r24
std Z+1,r25
std Z+2,r26
std Z+3,r27
/* epilogue: frame size=0 */
ret
__________________________________________________________
Mit WEB.DE FreePhone mit hoechster Qualitaet ab 0 Ct./Min.
weltweit telefonieren! http://freephone.web.de/?mc=021201