[avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions

avr-libc-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions

From:	Björn Haase
Subject:	[avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions
Date:	Wed, 01 Dec 2004 22:52:26 +0100

Hi,

I have been working on improvements of avr-gcc when dealing with
variables of different sizes, e.g. operations combining objects of
type int8_t, uint16_t, int32_t , etc. .

My main objective has been to reduce the amount of registers required by the
different operations. 
E.g. I did not like that when compiling expressions like

extern uint32_t t1;
t1 = 3;

the compiler allways generated code such as

        ldi r18,lo8(3)
        ldi r19,hi8(3)
        ldi r20,hlo8(3)
        ldi r21,hhi8(3)
        sts t1,r18
        sts (t1)+1,r19
        sts (t1)+2,r20
        sts (t1)+3,r21

using lots of unnecessary registers and load instructions.
My present compiler version generates

        ldi r25,lo8(3)
        sts t1,r25
        sts (t1)+1,__zero_reg__
        sts (t1)+2,__zero_reg__
        sts (t1)+3,__zero_reg__ ; combined instr

instead. 

In the above example the improvement is obvious.
For other cases (e.g. concerning sign-extension of int8_t variables), 
however, some optimization steps could be both good and bad depending on the
code type.

I, thus, whish to test my compiler with "real" code in order to avoid
that I finally tune the instruction patterns for 
artificial test-cases that might not be helpful in practice.
I think this is especially important, since the main benefit of the additional
patterns is that the compiler needs to allocate less registers. This effect can
only be observed in larger functions.

Particularly useful for the test would be "real" code fragments that 
are excessively working with variables and constants of different sizes.
E.g. code with rather large functions with plenty of type-casts, etc. .

Yours,

Björn


P.S.

Even more useful would of course be, if there were someone who would be willing 
to have a look at my changes in the 
machine description in order to check them :-) or give ideas for further 
improvement.


P.P.S.: Two further examples of compilation results (all with -Os and -fnew-ra)

----------------------------

extern int32_t source3;

int32_t sample (int8_t source1, uint16_t source2)
{ 
  return ((source1 + source2) + source3);
}

/* modified compiler */
sample:
        movw r18,r22
        clr __tmp_reg__
        sbrc r24,7
        com __tmp_reg__
        add r18,r24
        adc r19,__tmp_reg__ ; combined instr
        lds r22,source3
        lds r23,(source3)+1
        lds r24,(source3)+2
        lds r25,(source3)+3
        add r22,r18
        adc r23,r19
        adc r24,__zero_reg__
        adc r25,__zero_reg__ ; combined instr
        ret

/* Head gcc 4 */

sample:
        clr r25
        sbrc r24,7
        com r25
        add r24,r22
        adc r25,r23
        movw r18,r24
        clr r20
        clr r21
        movw r24,r20
        movw r22,r18
        lds r18,source3
        lds r19,(source3)+1
        lds r20,(source3)+2
        lds r21,(source3)+3
        add r22,r18
        adc r23,r19
        adc r24,r20
        adc r25,r21
        ret


-----------------------------------
extern int32_t *target;

void sample2 (int8_t source1)
{ 
  *target = source1 + 3;
}


/* modified compiler */
sample2:
        lds r30,target
        lds r31,(target)+1
        clr r25
        sbrc r24,7
        com r25
        adiw r24,3
        clr __tmp_reg__
        sbrc r25,7
        com __tmp_reg__
        st Z,r24
        std Z+1,r25
        std Z+2,__tmp_reg__
        std Z+3,__tmp_reg__ ; combined instr
        ret


/* Head gcc 4 */

sample2:
/* prologue: frame size=0 */
/* prologue end (size=0) */
         ;  basic block 0
        lds r30,target
        lds r31,(target)+1
        clr r25
        sbrc r24,7
        com r25
        adiw r24,3
        clr r26
        sbrc r25,7
        com r26
        mov r27,r26
        st Z,r24
        std Z+1,r25
        std Z+2,r26
        std Z+3,r27
/* epilogue: frame size=0 */
        ret

__________________________________________________________
Mit WEB.DE FreePhone mit hoechster Qualitaet ab 0 Ct./Min.
weltweit telefonieren! http://freephone.web.de/?mc=021201

[Prev in Thread]

Current Thread

[Next in Thread]

[avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions, Björn Haase <=
- Re: [avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions, Paul Schlie, 2004/12/01
  - AW: [avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions, Björn Haase, 2004/12/07
    - Re: AW: [avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions, E. Weddington, 2004/12/07
    - Re: AW: [avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions, Paul Schlie, 2004/12/07
    - AW: AW: [avr-libc-dev] mixed 32 / 16 / 8bit expressions, Björn Haase, 2004/12/08
- Re: [avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions, E. Weddington, 2004/12/02
  - AW: [avr-libc-dev] Additional instruction patterns for or more efficient code using mixed QI/HI/SI expressions, Björn Haase, 2004/12/07
    - Re: AW: [avr-libc-dev] Additional instruction patterns for or more efficient code using mixed QI/HI/SI expressions, E. Weddington, 2004/12/07
    - Re: AW: [avr-libc-dev] Additional instruction patterns for or more efficient code using mixed QI/HI/SI expressions, Denis Chertykov, 2004/12/11
    - AW: AW: [avr-libc-dev] Additional instruction patterns for or more efficient code using mixed QI/HI/SI expressions, Björn Haase, 2004/12/11

Prev by Date: [avr-libc-dev] [bugs #11182] false definition in "iomx8.h"
Next by Date: Re: [avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions
Previous by thread: [avr-libc-dev] [bugs #11182] false definition in "iomx8.h"
Next by thread: Re: [avr-libc-dev] Improved optimizer for mixed 32 / 16 / 8 bit expressions
Index(es):
- Date
- Thread