[Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

From:	Andrew Hutchinson
Subject:	[Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]
Date:	Thu, 10 Jan 2008 20:38:59 -0500
User-agent:	Thunderbird 2.0.0.9 (Windows/20071031)


I tried they earlier example:

  char *p1 = x;
  foo(p1++);
  foo(p1++);
  foo(p1++);

etc

Even with different register allocation, the result is bad.

a) Basic flow that gcc creates is something like:

b=a+1;
R24=a;
foo(R24);
c=b+1;
R24=b;
foo(R24)

This needs 1 variable to be saved over call. But due to overlappedlifetimes, it creates 2. For example, both a and b must exist at same time.

If it had reversed ordering, it would not need this. For example

R24=a;
b=a+1
foo(R24);
R24=b;
c=b+1;
foo(R24)

with reordering, when  b is created, a is dead. So we only need 1 register

I am not experienced enough to know why gcc cannot optimise this case.But it looks like a weakness with gcc (not gcc-avr)

b) Register costs used to preference allocation are all equal for AVR -so there is no preference for ADIW regs (even when I removed !w)

So backend does not indicate a preference between R16=R16+1, R14=R14+1or R28=R28+1

In current gcc, frame pointer (r28-29) does not get used for registerallocation - clearly that would be the best "call saved" register whichcould use ADIW and avoid moves.It looks like allocation is made with frame pointer used. Then, if itis not required, it does not use R28-29. But it does not try allocationwithout frame_pointer.

I tried improved foo(p);p++; it produces much better code. Still notusing R28 (for same reason)In this case, the increment is specified after function call, so we donthave overlapped lifetime of registers - only one is then used and allbecomes simple.


Andy

PS Please report as a bug - gcc should be better than this.



Wouter van Gulik wrote:

The RTL dump will tell me why it chose R14 before.


What do you mean with RTL dump exactly? I tried looking through some dumps
but I could not make sense of it. I used -dP and --save-temps. But all
looked the same to me.

I recollect there were some odd "!" markers that stops the possibility of
ADIW registers being by reload for certain operations

That might be reason. If so I'll have to dig out why they were put in -
maybe to fix some other problem.


Well for all possible ADIW uses (addsi, addhi) it's a "!w".
If this could be undone much pointer arithmetic could be done better I
guess/hope.

Any clue on why "foo(p++)" gives even poorer code compared to "foo(p); p++"?

HTH,

Wouter


---- Wouter van Gulik <address@hidden> wrote:

Wouter van Gulik schreef:

Note that in some cases it could be very interesting to use r27, or Y,
register.

Should have written R28 of course.

Since gcc seems down at the moment I did some more testing.

Now consider this example:
void main(void)
{
        char *p = x;
        foo(p); p+=65;
        foo(p); p+=65;
        foo(p); p+=65;
        foo(p); p+=65;
        foo(p); p+=65;
        foo(p); p+=65;
        foo(p); p+=65;
        foo(p); p+=65;
        foo(p); p+=65;
        foo(p); p+=65;
}
This must be done using a subi/sbci pare.

But the compiler now seems to realize that p is a constant offset to x.
So we now get:

main:
/* prologue: frame size=0 */
        push r16
        push r17
/* prologue end (size=2) */
        lds r16,x
        lds r17,(x)+1
        movw r24,r16
        call foo
        movw r24,r16
        subi r24,lo8(-(65))
        sbci r25,hi8(-(65))
        call foo
        movw r24,r16
        subi r24,lo8(-(130))
        sbci r25,hi8(-(130))

Here x is stored in r16 and the cumulative offset is added to R24

But if the compiler can realize this... Then why not do this for adds
within the adiw range?!?
So for p++/p+=1 we would get something like:

        movw r24, r16
        adiw r24, 1
        call foo
        movw r24, r16
        adiw r24, 2
etc..

This is just as small as the earlier suggested use of R28!

Wouter

[Prev in Thread]

Current Thread

[Next in Thread]

RE: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list]GCC-AVR Register optimisations]], (continued)
- [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations], Andrew Hutchinson <=
  - Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations], Dmitry K., 2008/01/11
  - Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations], Wouter van Gulik, 2008/01/11

Prev by Date: Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]
Next by Date: Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]
Previous by thread: Re: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]]
Next by thread: Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]
Index(es):
- Date
- Thread