avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Speed challenge...


From: Peter N Lewis
Subject: Re: [avr-gcc-list] Speed challenge...
Date: Fri, 26 Apr 2002 19:52:35 +0800

Hi. I have a challenge. The (working) code below outputs 19 bits serially from a 4433 uC to another chip using data, clock, and strobe lines. The data comes from 3 lookup arrays of 256 bytes each. At 4MHz on a 4433 uC , it takes about 87us to send all 19 bits. I wonder if there is a faster way of doing the job?

Ok, well, firstly, the code is broken in to three identical chunks (clearly you block copied them given the comment "send first 8 bits" is repeated three times ;-). So look at just one chunk and optimize that.

 for (i=0; i<=7; i++) {                // send first 8 bits

   __cbi(PORTD,5);                     // set clock line low

   if (mask & data)                    // present data
     __sbi(PORTD,4);
   else
     __cbi(PORTD,4);

   __sbi(PORTD,5);                     // clock the data bit

   mask >>= 1;
 }

Compiles (with -O2) to:

.L9:
/* #APP */
        cbi 18,5
/* #NOAPP */
        mov r24,r18
        and r24,r19
        brne _PC_+2
        rjmp .L7
/* #APP */
        sbi 18,4
/* #NOAPP */
.L8:
/* #APP */
        sbi 18,5
/* #NOAPP */
        lsr r18
        subi r25,lo8(-(1))
        cpi r25,lo8(8)
        brlo .L9

.L7:
/* #APP */
        cbi 18,4
/* #NOAPP */
        rjmp .L8

Several things spring to mind immediately. The if else is causing a lot of jumps which is probably not helping. And cbi/sbi take 2 cycles as compared to a single out.

Cycle counts:

1 cycle: out,mov,and,lsr,subi,cpi
2 cycles: cbi,sbi,rjmp
brne: 1 if branch not taken, 2 if branch taken

So currently your loop takes (assuming 4 on and 4 off bits) around

2 + 1 + 1 + average(1 + 2 + 2 + 2, 2 + 2) + 2 + 1 + 1 + 1 + 2

16.5 cycles per bit, 313 cycles per 19 bits which at 4 MHz is 78uS, about in line with your timing.

So how to improve it. Each cycle in the loop saved will save roughly 1/16 of the time.

We can get rid of the index counter by using the mask

do {                // send first 8 bits

} while ( mask >>= 1 );

Next, we need to get rid of the else and the cbi/sbis. We can do this by remembering the value of PORTD in a variable "u08 d;". Just before the first loop, initialize it with "d = inp(PORTD);", and then:

do {                // send first 8 bits

   d &= ~(BV(5)|BV(4)); // clear bit 4 & 5
   if (data & mask) {
     d |= BV(4);
   }
   outb(d, PORTD );      // set clock line low and present the data
   d |= BV(5);
   outb(d, PORTD );      // clock the data bit

} while ( mask >>= 1 );

One consequence of this change is it assumes the data is read on the rising edge of the clock bit. If the data is read "while the clock is high", this code would not work. The assembly now looks like:

.L3:
        andi r25,lo8(-49)
        mov r24,r19
        and r24,r18
        breq .L6
        ori r25,lo8(16)
.L6:
/* #APP */
        out 18,r25
/* #NOAPP */
        ori r25,lo8(32)
/* #APP */
        out 18,r25
/* #NOAPP */
        lsr r18
        brne .L3

And takes 11 cycles per bit (saving about 26uS).

Unrolling the loop would save two cycles per bit.

Converting to assembler could probably save a couple more cycles.

HTH,
   Peter.
--
<http://www.interarchy.com/>  <ftp://ftp.interarchy.com/interarchy.hqx>
avr-gcc-list at http://avr1.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]