avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t


From: David Brown
Subject: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
Date: Wed, 22 Apr 2009 12:48:52 +0200
User-agent: Thunderbird 2.0.0.21 (Windows/20090302)

Dale Whitfield wrote:
Hi David                                         >@2009.04.22_10:55:27_+0200

Dale Whitfield wrote:
Hi David                                         >@2009.04.21_19:42:34_+0200

Dale Whitfield wrote:
Hi,

All this talk of super-optimisers, etc. brings this issue I've had,
to the fore again.

Perhaps there are suggestions on how to encourage the compiler to do the
optimisations rather than having to hand-code to get the best result.

Take this code:

volatile uint32_t       status;
static inline void set_status(uint32_t flag, uint8_t set) 
__attribute__((always_inline));

void set_status(uint32_t flag, uint8_t set) {
        if (set) status |= flag;
        else status &= ~flag;
}

Which when used in an interrupt generates:

ISR (INT7_vect)
{
<snip>
That's way too much code... Its fairly obvious where the optimisations
should be, although I can see that some have been done already.

I can't see much possibility for improving the above code (except by removing the push and zeroing of r1). You asked for "status" to be a volatile 32-bit int, so that's what you got. The code below is semantically different, and thus compiles differently.

I don't believe it is semantically different. Which is one reason I
raised this. I am using the version below in existing code and it
behaves correctly.

Loading 4 registers and then storing back 3 that are unchanged makes no
sense at all.

I'm afraid that's how "volatile" works. In a way, by using "volatile" you are telling the compiler "I *know* this makes no sense to you, but do it my way anyway".


Its how volatile works by *loading* the registers before making changes.
However, if its possible to only store back changed bytes, then surely
that's all that needs to be done.


Nope - volatile loads and volatile stores are always done completely, with the whole data element. That's fundamental to "volatile".

Where volatile comes in here is that the optimisation shouldn't use any
previous loads or modify register values more than once without
reloading/storing etc. Here, the value is loaded once, one byte is
changed and then all 4 are stored. That's wasteful.

That's what you get when you make a 32-bit item volatile - all accesses to it use the whole 32-bit data, every time.


:-) I guess we need to agree to disagree on this. I was hoping this
wouldn't become a philosophical issue around volatile data. Which is
fraught at the best of times.


You can disagree all you like - but I've got the C standards, every C compiler, every C compiler developer, and almost every C programmer on my side. It's not a matter of philosophy or disagreement any more than a disagreement about wither "0.5" is an int or not.

Suppose you have a 16-bit hardware timer that must always be set low byte first, then high byte. If you make a change that just affects the low byte, you still want the compiler to generate the full 16-bit write - you achieve that by declaring the timer as a 16-bit volatile variable. How is the compiler supposed to guess that you mean something different in this case?


I don't think this is a good example, because processor register
accesses should be treated differently to plain RAM accesses. And, yes,
of course just updating one byte in this scenario would be wrong.


C has no way to differentiate these concepts - it's simply not part of the language. With a perfect embedded systems programming language you could get exactly the type of access control you are asking for - C is not perfect, and does not have these concepts. It's all or nothing - volatile or not volatile.

You highlight an interesting distinction though.
Maybe this is where the optimisation goes astray. For IO access, or
access to processor registers rather, since they may be accessed via lds
sts, the optimisation doesn't apply. However, for RAM access, it does.


There is no difference, there is no "optimisation gone astray". You ask the compiler to use volatile accesses, and it uses volatile accesses.

In this example, the interrupt routine is half as long when, as I see
it, correctly optimised. On an interrupt particularly, this could be
critical time/cycle saving.

I agree with your conclusion - the code is longer and slower than needed. But you are totally wrong on your premise - this is not because the compiler is failing in its optimisation. It is because *your* code forces it to generate long and slow code.

There is no reason for using "volatile" in this situation - making "status" non-volatile will give you exactly the code you want here. But it may then cause trouble for other code, depending on how you use "status" and "set_status".


There is every reason. The flags get updated in interrupt routines and
in non-interrupt code.


That's dependent on parts of your code that I haven't seen, so I can't comment.

However, based on your other post in this thread, you are correct in that volatile access is not what you need - it's atomic access (that applies to 8-bit data as well as 32-bit data).

Try the following version of set_status:

void set_status(uint32_t flag, uint8_t set) {
        if (set) *(uint32_t* &status) |= flag;
        else *(uint32_t* &status) &= ~flag;
        asm("" : : : "memory");
}

First, we remove the "volatile" from status so that the optimiser can give you a single byte access when the function is inlined. Secondly, instead of using "volatile" to protect the access to "status", we use a memory clobber that forces writes to memory variables to be completed before going on. This will ensure that the change is written out to status as soon as possible (otherwise changes could be cached for later storage).


Which would be a problem, since the volatile nature of the variable is
important in this usage.

(I haven't tried compiling that code yet.)



_______________________________________________
AVR-GCC-list mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list






reply via email to

[Prev in Thread] Current Thread [Next in Thread]