|
From: | Nicholas Vinen |
Subject: | Re: [avr-gcc-list] Re: C vs. assembly performance |
Date: | Sun, 01 Mar 2009 00:43:22 +1100 |
User-agent: | Thunderbird 2.0.0.19 (X11/20090103) |
Georg-Johann Lay wrote:
If you are sure it is really some performance issue/regression and not due to some language standard implication, you can add a report toOK, I only spent a few minutes looking at old code and I found some obviously sub-optimal results. It distills down to this: #include <avr/io.h> int main(void) { unsigned long packet = 0; while(1) { if( !(PINC & _BV(PC2)) ) { packet = (packet<<1)|(((unsigned char)PINC>>1)&1); } PORTB = packet; } }avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3 The avr/io stuff is just so that it won't optimise the code away to nothing. I tried compiling it with both -Os and -O2: avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c The result includes this: lsl r18 ; packet ; 50 *ashlsi3_const/2 [length = 4] rol r19 ; packet rol r20 ; packet rol r21 ; packet in r24,38-0x20 ; D.1214, ; 16 *movqi/4 [length = 1] lsr r24 ; D.1214 ; 17 lshrqi3/3 [length = 1] ldi r25,lo8(0) ; , ; 48 *movqi/2 [length = 1] ldi r26,lo8(0) ; , ; 46 *movhi/4 [length = 2] ldi r27,hi8(0) ; , andi r24,lo8(1) ; tmp52, ; 19 andsi3/2 [length = 4] andi r25,hi8(1) ; tmp52, andi r26,hlo8(1) ; tmp52, andi r27,hhi8(1) ; tmp52, or r18,r24 ; packet, tmp52 ; 20 iorsi3/1 [length = 4] or r19,r25 ; packet, tmp52 or r20,r26 ; packet, tmp52 or r21,r27 ; packet, tmp52 The problem, it seems, is that the compiler doesn't realize that the right hand side of the _expression_ can only have any non-zero values in the bottom 8 bits, since it's an unsigned char which is being implicitly expanded to 32 bits for the or operation. In fact, it's only the bottom bit that's ever non-zero. As a result it's spending a number of cycles and registers doing useless things. I'll copy a report to the locations you mention in your e-mail. There are probably ways to work around this, such as making "packet" a union of an unsigned char and a long, then shifting the long and only ORing in the unsigned char. I'll note that there's also an optimization to be had with the right hand side of the _expression_. I would write the assembly something like this: lsl r18 rol r19 rol r20 rol r21 in r24,38-0x20 bst r24, 1 bld r18, 0 I'm sure I can find other examples of poor code generation in this particular file, since I remember coming across many cases where I replaced the generated code with inline assembly when I was originally working on it, but that will have to wait for later. Thanks for your help, I appreciate it. As I said avr-gcc is pretty good, but I would love it if it could get even better :) Nicholas |
[Prev in Thread] | Current Thread | [Next in Thread] |