[avr-gcc-list] Missed optimisations

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Missed optimisations

From:	David Brown
Subject:	[avr-gcc-list] Missed optimisations
Date:	Mon, 14 Jan 2008 15:00:59 +0100
User-agent:	Thunderbird 2.0.0.9 (Windows/20071031)

I've just tried out the crc code I posted recently on avr-chat:

> static const uint8_t PROGMEM crcTable8n[16] = {
>         0x00, 0x8D, 0x97, 0x1A, 0xA3, 0x2E, 0x34, 0xB9,
>         0xCB, 0x46, 0x5C, 0xD1, 0x68, 0xE5, 0xFF, 0x72
> };
>
> uint8_t CrcByte8n(uint8_t crc, uint8_t data) {
>         // Fold next byte into CRC

> crc = (crc << 4) ^ (pgm_read_byte(&crcTable8n[(crc >> 4) ^(data >> 4)]));> crc = (crc << 4) ^ (pgm_read_byte(&crcTable8n[(crc >> 4) ^(data & 0x0f)]));

>
>         return crc;
> }
>

The generated assemble from avr-gcc 4.2.2 (latest winavr release) with-Os is:


  40                    CrcByte8n:
  41                    /* prologue: frame size=0 */
  42                    /* prologue end (size=0) */
  43 0000 982F                  mov r25,r24      ;  tmp48, crc
  44 0002 9627                  eor r25,r22      ;  tmp48, data
  45 0004 9295                  swap r25         ;  tmp48
  46 0006 9F70                  andi r25,lo8(15)         ;  tmp48,
  47 0008 20E0                  ldi r18,lo8(crcTable8n)  ;  tmp51,
  48 000a 30E0                  ldi r19,hi8(crcTable8n)  ;  tmp51,
  49 000c F901                  movw r30,r18     ;  tmp52, tmp51
  50 000e E90F                  add r30,r25      ;  tmp52, tmp48
  51 0010 F11D                  adc r31,__zero_reg__     ;  tmp52
  52                    /* #APP */
  53 0012 E491                  lpm r30, Z       ;  __result
  54                            
  55                    /* #NOAPP */
  56 0014 8295                  swap r24         ;  crc.36
  57 0016 807F                  andi r24,lo8(-16)        ;  crc.36,
  58 0018 8E27                  eor r24,r30      ;  crc.36, __result
  59 001a E82F                  mov r30,r24      ;  tmp55, crc.36
  60 001c E295                  swap r30         ;  tmp55
  61 001e EF70                  andi r30,lo8(15)         ;  tmp55,
  62 0020 F0E0                  ldi r31,lo8(0)   ; ,
  63 0022 70E0                  ldi r23,lo8(0)   ;  data,
  64 0024 6F70                  andi r22,lo8(15)         ;  data,
  65 0026 7070                  andi r23,hi8(15)         ;  data,
  66 0028 E627                  eor r30,r22      ;  tmp56, data
  67 002a F727                  eor r31,r23      ;  tmp56, data
  68 002c E20F                  add r30,r18      ;  tmp56, tmp51
  69 002e F31F                  adc r31,r19      ;  tmp56, tmp51
  70                    /* #APP */
  71 0030 E491                  lpm r30, Z       ;  __result
  72                            
  73                    /* #NOAPP */
  74 0032 8295                  swap r24         ;  crc.36
  75 0034 807F                  andi r24,lo8(-16)        ;  crc.36,
  76 0036 8E27                  eor r24,r30      ;  crc.36, __result
  77 0038 90E0                  ldi r25,lo8(0)   ;  <result>,
  78                    /* epilogue: frame size=0 */
  79 003a 0895                  ret

The code is basically good (the "swap" instruction is used for theshifts, which is very nice - a big improvement over the older 3.4 gcc),but there are a few missed optimisations shown here that are probablyquite common in other code.

Why is the address of crcTable8n loaded into r18:r19 first, before beingcopied into r30:r31 for the address calculation? It seems that thishappens when the address is reused - if it is not reused, then r30:r31are loaded directly. However, the reuse does not benefit from havingthe address in a register - the "add r30, r18" and "adc r31, r19" onlines 68 and 69 could be replaced with subi and sbci instructions tosave space and time, and to free registers r18:r19. On most RISC cpus,storing the address in a register for reuse would be a benefit, which isprobably why this code is generated - on the AVR, it is not helpful (atleast, not here).

Secondly, the "(data & 0x0f)" clause generates messy 16-bit code. Irealise C requires integer promotion in such cases, but it's importantto try to remove unnecessary code such as loading the high register withzero, then anding it with zero, then eoring it. gcc version 3.4.6 wassometimes marginally better at such code. It should be noted that thequality of the generated code depends very much on the exact expression- the original "[(crc >> 4) ^ (data & 0x0f)]" generates poor code, whilethe equivalent "[((crc >> 4) ^ data) & 0x0f]" generates tight code.

Running the same code in c++ mode produces worse results - there aremore promotions to 16-bit which are not removed by the optimiser. Irealise there are no maintainers for the c++ port, and the c++ specificlibraries are not implemented, but is there a particular problem that iscausing this difference?

I've looked through the bug lists, and can't see any of these points inexisting bug reports, although there may be some connection withhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=11259



Best regards,

David

[Prev in Thread]

Current Thread

[Next in Thread]

[avr-gcc-list] Missed optimisations, David Brown <=
- Re: [avr-gcc-list] Missed optimisations, Wouter van Gulik, 2008/01/14
- RE: [avr-gcc-list] Missed optimisations, Weddington, Eric, 2008/01/14

Prev by Date: Re: [avr-gcc-list] '-morder' option with Avr-libc: comparison table
Next by Date: [avr-gcc-list] malloc heap
Previous by thread: [avr-gcc-list] avr gcc for java
Next by thread: Re: [avr-gcc-list] Missed optimisations
Index(es):
- Date
- Thread