avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Re: C vs. assembly performance


From: Nicholas Vinen
Subject: Re: [avr-gcc-list] Re: C vs. assembly performance
Date: Sun, 01 Mar 2009 00:43:22 +1100
User-agent: Thunderbird 2.0.0.19 (X11/20090103)

Georg-Johann Lay wrote:
If you are sure it is really some performance issue/regression and not due to some language standard implication, you can add a report to
    http://sourceforge.net/tracker/?group_id=68108

so that the subject won't be forgotten. Also mind
    http://gcc.gnu.org/bugs.html

And of course, you can ask questions here. In that case it is helpful if you can manage to simplify the source to a small piece of code that triggers the problem and allows others to reproduce the problem. (i.e. no #include in the code, no ... (except for varargs), a.s.o).

Snippets of .s may point to the problem when you add -dp -fverbose-asm

And there are lots of places where avr-gcc produces suboptimal or even bad code, so feedback is welcome.

But note that just a few guys are working on the AVR part of gcc.
I would do more if I had the time (and the support of some gurus to ask questions on internals then and when...)
OK, I only spent a few minutes looking at old code and I found some obviously sub-optimal results. It distills down to this:

#include <avr/io.h>

int main(void) {
  unsigned long packet = 0;

  while(1) {
    if( !(PINC & _BV(PC2)) ) {
      packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
    }
    PORTB = packet;
  }
}

avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3

The avr/io stuff is just so that it won't optimise the code away to nothing.

I tried compiling it with both -Os and -O2:

avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c

The result includes this:

	lsl r18	 ;  packet	 ;  50	*ashlsi3_const/2	[length = 4]
	rol r19	 ;  packet
	rol r20	 ;  packet
	rol r21	 ;  packet
	in r24,38-0x20	 ;  D.1214,	 ;  16	*movqi/4	[length = 1]
	lsr r24	 ;  D.1214	 ;  17	lshrqi3/3	[length = 1]
	ldi r25,lo8(0)	 ; ,	 ;  48	*movqi/2	[length = 1]
	ldi r26,lo8(0)	 ; ,	 ;  46	*movhi/4	[length = 2]
	ldi r27,hi8(0)	 ; ,
	andi r24,lo8(1)	 ;  tmp52,	 ;  19	andsi3/2	[length = 4]
	andi r25,hi8(1)	 ;  tmp52,
	andi r26,hlo8(1)	 ;  tmp52,
	andi r27,hhi8(1)	 ;  tmp52,
	or r18,r24	 ;  packet, tmp52	 ;  20	iorsi3/1	[length = 4]
	or r19,r25	 ;  packet, tmp52
	or r20,r26	 ;  packet, tmp52
	or r21,r27	 ;  packet, tmp52


The problem, it seems, is that the compiler doesn't realize that the right hand side of the _expression_ can only have any non-zero values in the bottom 8 bits, since it's an unsigned char which is being implicitly expanded to 32 bits for the or operation. In fact, it's only the bottom bit that's ever non-zero. As a result it's spending a number of cycles and registers doing useless things. I'll copy a report to the locations you mention in your e-mail.


There are probably ways to work around this, such as making "packet" a union of an unsigned char and a long, then shifting the long and only ORing in the unsigned char. I'll note that there's also an optimization to be had with the right hand side of the _expression_. I would write the assembly something like this:

	lsl r18
	rol r19
	rol r20
	rol r21
	in r24,38-0x20
	bst r24, 1
	bld r18, 0

I'm sure I can find other examples of poor code generation in this particular file, since I remember coming across many cases where I replaced the generated code with inline assembly when I was originally working on it, but that will have to wait for later.


Thanks for your help, I appreciate it. As I said avr-gcc is pretty good, but I would love it if it could get even better :)



Nicholas


reply via email to

[Prev in Thread] Current Thread [Next in Thread]