[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[avr-gcc-list] register allocation - something not right
From: |
HutchinsonAndy |
Subject: |
[avr-gcc-list] register allocation - something not right |
Date: |
Sun, 26 Dec 2004 18:39:54 -0500 |
I have been looking at the gcc avr register allocation and something are not
quite right.
I would welcome feedback on this as I work towards a solution.
It is very common to see sub-optimal allocations that results in redundant
moves. Somelike
reg x = reg y
reg z = nnn
reg y = reg x
often between function calls.
The new-ra options fixes many of these issues and I was hoping this
experimental allocator would eventually find its way in to mainstream gcc -
that appears unlikely. In addition, new-ra also has a bug that get the frame
size wrong and so adds overhead too many zero frame functions.
So I set about trying to find out why the basical allocator was so messed up.
I found part of the issue in the order of register allocation. The default
order being 24,25,18,19,20,21,22,23.... (although it can be changed using
-morder1 or -morder2)
Local register allocation uses the next avialable register from this list.
Allocation will select a register that is not destroyed by call if the required
register needs to survive a call. As r18..r27 are call used
At this time I am not concerned with registers need to survive function calls
so
If the register does not need to survive calls, call used registers are
obviously preferred and indeed indicated by the list.
So what happens is that temporary registers between calls favor r24..r25. This
also permits the result of int functions and the argument of int function to be
maintained in r24..r25, resulting in tight code.
For example:
190:test1.c **** int divi(int a, int b)
191:test1.c **** {
637 .LM57:
638 /* prologue: frame size=0 */
639 /* prologue end (size=0) */
640 026c 0E94 0000 call __divmodhi4
641 /* epilogue: frame size=0 */
642 0270 0895 ret
643 /* epilogue end (size=1) */
644 /* function divi size 3 (2) */
For long or float functions, r22..r25 is used - but gcc still starts at r24
using r24..r27 for long temporary values. So we end up with moves to and from
r22..r27 and r24..r26 as shown in the following example:
190:test1.c **** int divi(int a, int b)
191:test1.c **** {
637 .LM57:
638 /* prologue: frame size=0 */
639 /* prologue end (size=0) */
640 026c 0E94 0000 call __divmodhi4
641 /* epilogue: frame size=0 */
642 0270 0895 ret
643 /* epilogue end (size=1) */
644 /* function divi size 3 (2) */
646 .Lscope8:
648 .stabd 78,0,0
652 .global div
654 div:
655 .stabd 46,0,0
192:test1.c **** return a%b;
193:test1.c **** }
194:test1.c **** float div(float a,float b)
195:test1.c **** {
657 .LM58:
658 /* prologue: frame size=0 */
659 /* prologue end (size=0) */
660 0272 DC01 movw r26,r24
661 0274 CB01 movw r24,r22
662 0276 BC01 movw r22,r24
663 0278 CD01 movw r24,r26
664 027a 0E94 0000 call __divsf3
665 027e DC01 movw r26,r24
666 0280 CB01 movw r24,r22
196:test1.c **** return a/b;
197:test1.c **** }
668 .LM59:
669 0282 BC01 movw r22,r24
670 0284 CD01 movw r24,r26
671 /* epilogue: frame size=0 */
672 0286 0895 ret
There is obviously an issue with gcc failing to realise that r22..r27 DOES have
the right value in the above example and that the moves can be eliminated.
The movement to/from r24 is originally generated by the allocation order. The
failure to optimise this later appears to be due to the overlap of r22..r25 and
r24..r27, (ie register optimisation cant cope with that)
If the same code is compiled using -morder1. The local allocation order then
becomes: 18,19, 20,21, 22,23, 24,25.....
This has no overlap and resulting in far superior code:
194:test1.c **** float div(float a,float b)
195:test1.c **** {
654 .LM58:
655 /* prologue: frame size=0 */
656 /* prologue end (size=0) */
657 026c 0E94 0000 call __divsf3
658 /* epilogue: frame size=0 */
659 0270 0895 ret
660 /* epilogue end (size=1) */
661 /* function div size 3 (2) */
For a wider test I tried butterfly which went from 0x363e bytes to 0x360c
bytes. Some test code that used more long/float operations fell from
1996(dec) to 1956(dec) bytes.
Now there is more to it than I have explained. Overlap and non-optimal placment
still exist is subsequent allocations. So the order of the entire list can be
critical.
Using a custom order ( 22,23,24,25, 18,19....) saw futher reductions to 0x35c4
and 1940(dec) respectively.
Perhaps more is possible but as I tried, I came across other issues that were
clearly countering my attempts to arrive at a logical ordering.
In particular, the order in which registers are used in function calls. For
example, r22..r25 used for a long/float arg/result whereas r24..r25 is used
for integer arg/results.
There would appear to be two issues with this:
i) using r24 immediately splits the bank of call used registers (r18-r27). This
makes subsequent register allocations around a function call more difficult.
Starting at r18 would avoid this.
ii) Values are always EXTENDED into higher numbered registers. So extending
r24..r25 into a long would naturally become r24..r27. However, our functions
use r22..r25 - so we end up with more overlapping moves!.
The optimal order for function parameters and return values would seem to be:
r18..r19,r18..r21 etc. i.e. starting at one end and strictly in "endian" order.
If that can be achieved, the register allocation order should be much easier to
determine and much more productive.
Feedback welcome.
PS for ref gcc 4.0.0 experimental ss12052005 - hacked.
__________________________________________________________________
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register
Netscape. Just the Net You Need.
New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp
- [avr-gcc-list] register allocation - something not right,
HutchinsonAndy <=