avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"


From: David Brown
Subject: Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"
Date: Thu, 9 Feb 2017 15:14:11 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

You could file this as a bug on the website:
<https://savannah.nongnu.org/bugs/?group=avr-libc>

As far as I understand it, the documentation (both on the website and
the Atmel documentation) is generated directly from the library code and
comments - so this would be a change to the library source.

Feel free to quote any parts of my mails on the subject when filing the bug.

mvh.,

David


On 09/02/17 13:11, Marcin Godlewski wrote:
> Dear All,
> 
> The site
> http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
> still contains buggy description of memory barriers in avr-gcc. As
> this site is popular among avr users I think it's really worth
> fixing. What is more the same inaccurate article is available on
> Atmel doc site:
> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> . Is there anybody subscribed to this mailing list who can contact
> the authors/maintainers of the site in order to discuss correction of
> the content?
> 
> Marcin Godlewski
> 
> W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
> <address@hidden> napisał:
>> W dniu 2016-12-09 10:11:55 użytkownik David Brown
>> <address@hidden> napisał:
>>> On 08/12/16 21:46, Georg-Johann Lay wrote:
>>>> Marcin Godlewski schrieb:
>>>>> Dear all,
>>>>> 
>>>>> Thanks for the reply to David. However I'm not trying to find
>>>>> a solution for the described issue. What I'm trying to say in
>>>>> this e-mail is that this part of Atmel documentation: 
>>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>>>>>
>>>>> 
is innacurate and should be corrected. The conclusion says:
>>>>> 
>>>>> memory barriers ensure proper ordering of volatile accesses
>>>>> 
>>>>> memory barriers don't ensure statements with no volatile
>>>>> accesses to be reordered across the barrier while it should
>>>>> say:
>>>>> 
>>>>> memory barriers ensure proper ordering of global variables
>>>>> accesses
>>>>> 
>>>>> memory barriers don't ensure local variables accesses to be 
>>>>> reordered across the barrier
>>>> 
>>>> At least the "local" vs. "global" is not completely correct.
>>>> After all it's about memory accesses, and it doesn't matter if
>>>> the memory is local (e.g. local static) or if you are
>>>> dereferencing a pointer (which might point to a local auto or
>>>> to an object on heap).
>>>> 
>>>> The code example you quoted above is actually due to a subtle 
>>>> implementation detail of division, modulo and some other
>>>> arithmetic of GCC's avr backend (the division is _not_ a call
>>>> actually).
>>>> 
>>>> IIRC the solution for me back then was -fno-tree-ter as any
>>>> messing with inline asm doesn't hit the point.
>>> 
>>> Yes, that is the solution you proposed when we discussed it a
>>> good while back (on the avrlibc list, I think).  I disagree with
>>> you somewhat here (as I did then, though I can't remember if we
>>> discussed the details).
>>> 
>>> Changing an optimisation option like this means that code that
>>> looks right, will run as expected - and that is a good thing.
>>> But it also means that the code will /only/ be correct if
>>> particular optimisation flags are set in particular ways.  That
>>> is a very fragile situation, and should always be avoided.  To be
>>> safe, this optimisation would have to be completely disabled in
>>> the avr-gcc port.  I don't know how useful this particular
>>> optimisation is in terms of generating more efficient code,
>>> though from the gcc manual it appears very useful and is enabled 
>>> at -O1.  Clearly that determines the "cost" of this solution to
>>> the re-ordering problem.
>>> 
>>> 
>>> The use of the assembly dependency (or a nicely named macro with
>>> the same effect) fixes the problem in this situation.  It does so
>>> regardless of optimisation options - the compiler is required to
>>> have calculated the result of the division before disabling
>>> interrupts, and cannot re-order the operations.  It does so
>>> without adding any extra assembly code or hindering any
>>> optimisations - it merely forces an order on operations that are
>>> to be done anyway.
>>> 
>>> It has the clear disadvantage of needing extra code in the
>>> user's source.  Like memory barriers, it is a way of giving the
>>> compiler extra information that cannot be expressed in normal C,
>>> and which the compiler cannot (at the moment) figure out for
>>> itself.
>>> 
>>> You say that the assembly dependency does not "hit the point".  I
>>> think you are correct there - it is treating the symptom, not the
>>> disease.  It is not telling the compiler that an operation should
>>> not be re-ordered, or that division is a costly operation.  It
>>> simply tells the compiler that we need the results of that
>>> computation /here/.  But it is a very effective and efficient
>>> cure for this sort of problem.  Unless and until there is some
>>> /safe/ fix in the compiler to avoid this (and I don't count "put
>>> this compiler option in your command line" as safe), I really do
>>> think it is the best we have.
>>> 
>>> 
>>> Note, however, that the "forceDependency" macro only solves half
>>> the problem.  Consider :
>>> 
>>> unsigned int test2b(void) { unsigned int val;
>>> 
>>> cli(); val = ivar; sei(); val = 65535 / val; return val; }
>>> 
>>> In this case, the compiler could move the division backwards
>>> above the sei(), giving a similar problem.  (It did not make the
>>> move in my brief tests - but it /could/ do.)  I don't know if the
>>> -fno-tree-ter flag stops that too, but the forceDependency()
>>> macro is not enough.  The forgetCompilerKnowledge macro is the
>>> answer:
>>> 
>>> unsigned int test2b(void) { unsigned int val;
>>> 
>>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
>>> 65535 / val; return val; }
>>> 
>>> This tells the compiler that it needs to stabilise the value of
>>> "val", and it can't assume anything about "val" after this point
>>> in the code, because it /might/ be read and /might/ change in the
>>> assembly code. Again, nothing is actually generated in the
>>> assembly and we are only forcing an ordering on the code.
>>> 
>>> 
>>> Nothing would please me better here here than to have the
>>> compiler understand that users would not want such re-ordering
>>> around cli() and sei(), so that the problem simply goes away.
>>> But it should not require particular choices of compiler flags,
>>> nor should it require disabling useful optimisations and thus
>>> generating poorer code elsewhere.
>>> 
>>> It is also worth noting that though this situation occurs
>>> because division does not work like a normal function call,
>>> despite it using a library call for implementation, there is
>>> nothing fundamental to stop the compiler moving a call to foo()
>>> back or forth across a cli() or sei() as long as the compiler is
>>> sure that no memory is accessed, no volatiles are accessed, and
>>> there are no other externally visible effects in foo().  If the
>>> definition of foo() is available when compiling the code, then
>>> the compiler could well know this to be the case.  If we replace
>>> "val = 65535U / val;" with "val = foo(val);", where we have :
>>> 
>>> unsigned int foo(unsigned int v) { return (v * v) - v; }
>>> 
>>> in the same compilation unit, some or all of the calculation from
>>> foo() will be inlined and mixed with the cli().  Again,
>>> -fno-tree-ter fixes this - at the cost of killing such mixing and
>>> optimisation in cases where it is useful.  And again, the inline
>>> assembly fixes it at the cost of knowing that you have to add
>>> this line of source code.
>>> 
>>> As gcc gets ever smarter with its inlining, function cloning,
>>> link-time optimisations, etc., then this will become more and
>>> more of an issue.
>>> 
>>> 
>>> 
>>> Maybe the answer is that gcc needs an "execution barrier" that
>>> is stronger than a memory barrier, because it will also block
>>> calculations - it would act as a barrier to all local variables.
>>> I cannot think of any way to make such a barrier with inline
>>> assembly or the C11 fences - I think it would take a new
>>> __builtin for gcc.  Such a feature would have use on all embedded
>>> targets, not just AVR.
>>> 
>>> mvh.,
>>> 
>>> David
>>> 
>>> 
>> 
>> I totally agree with you - a feature like "execution barrier" would
>> be very useful. C11 made good job standardizing multi-threading
>> features but unfortunately the features not always fits firmware
>> development. Controlling what exactly goes into critical section is
>> a fundamental problem, so I would even go further - why don't you
>> propose the "execution barrier" as a new feature for the future C
>> language standard?
>> 
>>> 
>>>> 
>>>> Johann
>>>> 
>>>>> I don't know whether this group is the right place to post it
>>>>> however I do not know any better place. Hope someone here can
>>>>> trigger the change of the documentation and I also hope to be
>>>>> corrected if I am wrong.
>>>>> 
>>>>> Thanks and regards, Marcin
>>> 
>>> 
>> 
>> 
>> 
>> 
> 
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]