avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Re: Missed optimization or am *I* missing something?


From: David Brown
Subject: [avr-gcc-list] Re: Missed optimization or am *I* missing something?
Date: Thu, 23 Sep 2010 17:24:57 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4

On 23/09/2010 15:57, Paulo Marques wrote:
Weddington, Eric wrote:
Hi Johannes,

Hi, Eric

My best guess is that this is a missed optimization. The compiler is
probably selecting different patterns based on the different code. It
recognizes the redundant adding of zero in the operations below when
WEIRD is 0 and rightly throws them out. When WEIRD is 1, those
operations have to be kept in, and so different patterns are selected,
which are either more optimal, or triggers different kinds of
optimizations to take place afterwards, hence producing better code even
though there are more operations. I noticed that the longer code
generates a lot more code with the pointer registers (Z and X in this case).

Yes, it seems that in the WEIRD=0 case, the compiler thinks the loop
gets simple enough to be unrolled. If you look at the long version,
there is no loop at all in it.


That's what is happening here.

If this is performance-critical code, it could be very much faster with some "hand-optimised" C code. I don't like doing this sort of thing - it's the compiler's job, not the programmer's. But compiler writers are only human, and there are while the compiler could /theoretically/ generate better code, there is a limit to how much the compiler can figure out in practice.

Normally it's best to access arrays as arrays, rather than manually convert them into pointers. But we can get better code here by being a little bit smarter about the array offset calculations. This is especially true on the 8-bit avr, since gcc promotes all array offset arithmetic to "int" at an early stage, making it particularly inefficient.

uint16_t FOO(void) {
        uint16_t boundedBufferValueSum = 0;
        uint8_t offset = DMA_CH0_TRFCNT;
        uint16_t *p = &fooBoundedBuffer[DMA_CH0_TRFCNT];

        for (uint8_t i = 0; i < 4; i++) {
                boundedBufferValueSum += *p++;
                if (++offset >= FOOBUFSIZE) {
                        p -= FOOBUFSIZE;
                }
        }
        return boundedBufferValueSum;
}



I noticed too that the more optimal case below is still not the best
code: it looks like the index variable 'i' in your for loop is being
treated as a 16-bit value, when you explicitly declare it as an 8-bit
unsigned char. Fixing that could save another 3 instructions.

One thing I noticed a long time ago with avr-gcc (I haven't checked more
recent versions for this) is that splitting expressions into simpler
ones helps the compiler detect that it doesn't need to upgrade the
variables to 16 bit int's.

For instance, if you write the code as:

     unsigned char tmp;
...
     offset = DMA.CH0.TRFCNT;
     offset -= WEIRD;
     boundedBufferValueSum = 0;
     for (i = 0; i<  4; i++) {
        tmp = offset;
        tmp += i;
        tmp += WEIRD;
        tmp %= FOOBUFSIZE;
         boundedBufferValueSum += fooBoundedBuffer[tmp];
     }

you'll probably get smaller code. Yes, it's ugly, but if that function
is in some hot path, it might be worth it...






reply via email to

[Prev in Thread] Current Thread [Next in Thread]