avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t


From: Dale Whitfield
Subject: Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
Date: Wed, 22 Apr 2009 09:23:49 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

Hi David                                         >@2009.04.21_19:42:34_+0200

> Dale Whitfield wrote:
>> Hi,
>>
>> All this talk of super-optimisers, etc. brings this issue I've had,
>> to the fore again.
>>
>> Perhaps there are suggestions on how to encourage the compiler to do the
>> optimisations rather than having to hand-code to get the best result.
>>
>> Take this code:
>>
>> volatile uint32_t    status;
>> static inline void set_status(uint32_t flag, uint8_t set) 
>> __attribute__((always_inline));
>>
>> void set_status(uint32_t flag, uint8_t set) {
>>      if (set) status |= flag;
>>      else status &= ~flag;
>> }
>>
>> Which when used in an interrupt generates:
>>
>> ISR (INT7_vect)
>> {
>>     4e9a:    1f 92           push    r1
>>     4e9c:    0f 92           push    r0
>>     4e9e:    0f b6           in      r0, 0x3f        ; 63
>>     4ea0:    0f 92           push    r0
>>     4ea2:    11 24           eor     r1, r1
>>     4ea4:    8f 93           push    r24
>>     4ea6:    9f 93           push    r25
>>     4ea8:    af 93           push    r26
>>     4eaa:    bf 93           push    r27
>>     4eac:    80 91 1a 18     lds     r24, 0x181A
>>     4eb0:    90 91 1b 18     lds     r25, 0x181B
>>     4eb4:    a0 91 1c 18     lds     r26, 0x181C
>>     4eb8:    b0 91 1d 18     lds     r27, 0x181D
>>     4ebc:    a0 64           ori     r26, 0x40       ; 64
>>     4ebe:    80 93 1a 18     sts     0x181A, r24
>>     4ec2:    90 93 1b 18     sts     0x181B, r25
>>     4ec6:    a0 93 1c 18     sts     0x181C, r26
>>     4eca:    b0 93 1d 18     sts     0x181D, r27
>>      set_status(0x00400000UL, 1);
>>     4ed0:    bf 91           pop     r27
>>     4ed2:    af 91           pop     r26
>>     4ed4:    9f 91           pop     r25
>>     4ed6:    8f 91           pop     r24
>>     4ed8:    0f 90           pop     r0
>>     4eda:    0f be           out     0x3f, r0        ; 63
>>     4edc:    0f 90           pop     r0
>>     4ede:    1f 90           pop     r1
>>     4ee0:    18 95           reti
>>
>> That's way too much code... Its fairly obvious where the optimisations
>> should be, although I can see that some have been done already.
>>
>
> I can't see much possibility for improving the above code (except by  
> removing the push and zeroing of r1).  You asked for "status" to be a  
> volatile 32-bit int, so that's what you got.  The code below is  
> semantically different, and thus compiles differently.
>

I don't believe it is semantically different. Which is one reason I
raised this. I am using the version below in existing code and it
behaves correctly.

Loading 4 registers and then storing back 3 that are unchanged makes no
sense at all.

Where volatile comes in here is that the optimisation shouldn't use any
previous loads or modify register values more than once without
reloading/storing etc. Here, the value is loaded once, one byte is
changed and then all 4 are stored. That's wasteful.

In this example, the interrupt routine is half as long when, as I see
it, correctly optimised. On an interrupt particularly, this could be
critical time/cycle saving.

>> If I recode the routine as follows, I get what I want:
>>
>> void set_status(uint32_t flag, uint8_t set) {
>>      volatile uint8_t *pstatus = (uint8_t *)&status;
>>
>>      if (set) {
>>              if (flag & 0x000000ff)
>>                      pstatus[0] |= (uint8_t)(flag & 0xff);
>>              if (flag & 0x0000ff00)
>>                      pstatus[1] |= (uint8_t)(flag >> 8);
>>              if (flag & 0x00ff0000)
>>                      pstatus[2] |= (uint8_t)(flag >> 16);
>>              if (flag & 0xff000000)
>>                      pstatus[3] |= (uint8_t)(flag >> 24);
>>      } else {
>>              if (flag & 0x000000ff)
>>                      pstatus[0] &= ~(uint8_t)(flag & 0xff);
>>              if (flag & 0x0000ff00)
>>                      pstatus[1] &= ~(uint8_t)(flag >> 8);
>>              if (flag & 0x00ff0000)
>>                      pstatus[2] &= ~(uint8_t)(flag >> 16);
>>              if (flag & 0xff000000)
>>                      pstatus[3] &= ~(uint8_t)(flag >> 24);
>>      }
>> }
>>
>> Which is:
>>
>> ISR (INT7_vect)
>> {
>>     6b80:    1f 92           push    r1
>>     6b82:    0f 92           push    r0
>>     6b84:    0f b6           in      r0, 0x3f        ; 63
>>     6b86:    0f 92           push    r0
>>     6b88:    11 24           eor     r1, r1
>>     6b8a:    8f 93           push    r24
>>     6b8c:    80 91 4e 18     lds     r24, 0x184E
>>     6b90:    80 64           ori     r24, 0x40       ; 64
>>     6b92:    80 93 4e 18     sts     0x184E, r24
>>      set_status(0x00400000UL, 1);
>>     6b98:    8f 91           pop     r24
>>     6b9a:    0f 90           pop     r0
>>     6b9c:    0f be           out     0x3f, r0        ; 63
>>     6b9e:    0f 90           pop     r0
>>     6ba0:    1f 90           pop     r1
>>     6ba2:    18 95           reti
>>
>> Compiler options are:
>>
>> avr-gcc  -mmcu=atmega2560 -Wall -gdwarf-2 -std=gnu99 -Os -funsigned-char 
>> -funsigned-bitfields -fpack-struct -fshort-enums -finline-limit=10 
>> -DF_CPU=7372800UL
>>
>> version 4.3.3
>>
>> In a 75k byte application this reduces code size by 1632 bytes.
>>
>> Dale.
>
>
>
> _______________________________________________
> AVR-GCC-list mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]