[avr-gcc-list] Poor handling of non-power-of-2 multiplication on ATtiny

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Poor handling of non-power-of-2 multiplication on ATtiny

From:	Paul \"LeoNerd\" Evans
Subject:	[avr-gcc-list] Poor handling of non-power-of-2 multiplication on ATtiny
Date:	Sun, 5 Apr 2015 15:35:40 +0100

I'm writing code for an ATtiny (tiny84 specifically).

I currently have an array and a function defined thus:

  static struct {
    uint8_t state;
    uint8_t v1;
  } leds[3];

  uint8_t led_state(uint8_t i) { return leds[i].state; }

Elements of this array are 2 bytes long, and this function compiles to
some fairly sensible looking code:

000003b8 <led_state>:
 3b8:   e8 2f           mov     r30, r24
 3ba:   f0 e0           ldi     r31, 0x00       ; 0
 3bc:   ee 0f           add     r30, r30
 3be:   ff 1f           adc     r31, r31
 3c0:   e1 57           subi    r30, 0x71       ; 113
 3c2:   ff 4f           sbci    r31, 0xFF       ; 255
 3c4:   80 81           ld      r24, Z
 3c6:   08 95           ret

The instruction at 3bc adds r30 to itself, effectively causing it to
contain 2*i, which is the offset in bytes required to be added to the
base address of the `leds` array to find the required field.

If instead I define my array thus:

  static struct {
    uint8_t state;
    uint16_t v1;
  } leds[3];

I get the most horrible:

000003be <led_state>:
 3be:   90 e0           ldi     r25, 0x00       ; 0
 3c0:   63 e0           ldi     r22, 0x03       ; 3
 3c2:   70 e0           ldi     r23, 0x00       ; 0
 3c4:   2d d3           rcall   .+1626          ; 0xa20 <__mulhi3>
 3c6:   81 57           subi    r24,0x71        ; 113
 3c8:   9f 4f           sbci    r25, 0xFF       ; 255
 3ca:   fc 01           movw    r30, r24
 3cc:   80 81           ld      r24, Z
 3ce:   08 95           ret

Here, gcc has given up trying to perform a non power-of-two
multiplication and just inlined a call to the generic __mulhi3 function
instead.

This is somewhat disappointing, as if I further expand my array with a
dummy field for padding, I now get:

000003be <led_state>:
 3be:   e8 2f           mov     r30, r24
 3c0:   f0 e0           ldi     r31, 0x00       ; 0
 3c2:   ee 0f           add     r30, r30
 3c4:   ff 1f           adc     r31, r31
 3c6:   ee 0f           add     r30, r30
 3c8:   ff 1f           adc     r31, r31
 3ca:   e1 57           subi    r30, 0x71       ; 113
 3cc:   ff 4f           sbci    r31, 0xFF       ; 255
 3ce:   80 81           ld      r24, Z
 3d0:   08 95           ret

because it has neatly managed to perform that power-of-two
multiplication, this time of 4*i.

I feel this is a situation that could be resolved better; for instance,
a multiplication by 3 in the middle case could have been performed by

  mov r30, r24
  ldi r31, 0x00    ; Z == i
  add r30, r30
  adc r31, r31     ; Z == 2*i
  add r30, r24
  adc r31, r1      ; Z == 2*i + i == 3*i

in just as many instructions as the 4*i case, without my having to
waste extra bytes of precious RAM just to provide that dummy padding.
Furthermore, I care a lot about instruction counts to access because
this LED management code runs inside a timer match interrupt to
implement software PWM, so it has to be fast and tight.


As a workaround for now, I can restructure my code into two separate
arrays of integers, instead of one array of two-int structs, and thus
ensure that all arrays contain only a power-of-two number of bytes, so
all accesses will be efficient. However, it would be nice if gcc could
manage these multiplications a little nicer.

-----

As another side note, consider the massive difference between

  uint8_t mul10(uint8_t x) { return x * 10; }

 3be:   6a e0           ldi     r22, 0x0A       ; 10
 3c0:   29 d3           rcall   .+1618          ; 0xa14 <__mulqi3>
 3c2:   08 95           ret

vs. something like

  mov r0, r24   ; r0 = x
  add r24, r24  ; r24 = 2*x
  add r24, r24  ; r24 = 4*x
  add r24, r0   ; r24 = 5*x
  add r24, r24  ; r24 = 10*x
  ret

-----

Is there currently underway an attempt to provide better optimisation
of these sorts of cases? If not, where/how could I contribute such?

-- 
Paul "LeoNerd" Evans

address@hidden
http://www.leonerd.org.uk/  |  https://metacpan.org/author/PEVANS

pgpzrumlSJSh3.pgp
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[avr-gcc-list] Poor handling of non-power-of-2 multiplication on ATtiny, Paul \"LeoNerd\" Evans <=
- Re: [avr-gcc-list] Poor handling of non-power-of-2 multiplication on ATtiny, Georg-Johann Lay, 2015/04/06
- [avr-gcc-list] Poor handling of non-power-of-2 multiplication on ATtiny, Paul \"LeoNerd\" Evans, 2015/04/05

Next by Date: [avr-gcc-list] Poor handling of non-power-of-2 multiplication on ATtiny
Next by thread: Re: [avr-gcc-list] Poor handling of non-power-of-2 multiplication on ATtiny
Index(es):
- Date
- Thread