[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"
From: |
David Brown |
Subject: |
Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code" |
Date: |
Thu, 9 Feb 2017 15:14:11 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 |
You could file this as a bug on the website:
<https://savannah.nongnu.org/bugs/?group=avr-libc>
As far as I understand it, the documentation (both on the website and
the Atmel documentation) is generated directly from the library code and
comments - so this would be a change to the library source.
Feel free to quote any parts of my mails on the subject when filing the bug.
mvh.,
David
On 09/02/17 13:11, Marcin Godlewski wrote:
> Dear All,
>
> The site
> http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
> still contains buggy description of memory barriers in avr-gcc. As
> this site is popular among avr users I think it's really worth
> fixing. What is more the same inaccurate article is available on
> Atmel doc site:
> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> . Is there anybody subscribed to this mailing list who can contact
> the authors/maintainers of the site in order to discuss correction of
> the content?
>
> Marcin Godlewski
>
> W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
> <address@hidden> napisał:
>> W dniu 2016-12-09 10:11:55 użytkownik David Brown
>> <address@hidden> napisał:
>>> On 08/12/16 21:46, Georg-Johann Lay wrote:
>>>> Marcin Godlewski schrieb:
>>>>> Dear all,
>>>>>
>>>>> Thanks for the reply to David. However I'm not trying to find
>>>>> a solution for the described issue. What I'm trying to say in
>>>>> this e-mail is that this part of Atmel documentation:
>>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>>>>>
>>>>>
is innacurate and should be corrected. The conclusion says:
>>>>>
>>>>> memory barriers ensure proper ordering of volatile accesses
>>>>>
>>>>> memory barriers don't ensure statements with no volatile
>>>>> accesses to be reordered across the barrier while it should
>>>>> say:
>>>>>
>>>>> memory barriers ensure proper ordering of global variables
>>>>> accesses
>>>>>
>>>>> memory barriers don't ensure local variables accesses to be
>>>>> reordered across the barrier
>>>>
>>>> At least the "local" vs. "global" is not completely correct.
>>>> After all it's about memory accesses, and it doesn't matter if
>>>> the memory is local (e.g. local static) or if you are
>>>> dereferencing a pointer (which might point to a local auto or
>>>> to an object on heap).
>>>>
>>>> The code example you quoted above is actually due to a subtle
>>>> implementation detail of division, modulo and some other
>>>> arithmetic of GCC's avr backend (the division is _not_ a call
>>>> actually).
>>>>
>>>> IIRC the solution for me back then was -fno-tree-ter as any
>>>> messing with inline asm doesn't hit the point.
>>>
>>> Yes, that is the solution you proposed when we discussed it a
>>> good while back (on the avrlibc list, I think). I disagree with
>>> you somewhat here (as I did then, though I can't remember if we
>>> discussed the details).
>>>
>>> Changing an optimisation option like this means that code that
>>> looks right, will run as expected - and that is a good thing.
>>> But it also means that the code will /only/ be correct if
>>> particular optimisation flags are set in particular ways. That
>>> is a very fragile situation, and should always be avoided. To be
>>> safe, this optimisation would have to be completely disabled in
>>> the avr-gcc port. I don't know how useful this particular
>>> optimisation is in terms of generating more efficient code,
>>> though from the gcc manual it appears very useful and is enabled
>>> at -O1. Clearly that determines the "cost" of this solution to
>>> the re-ordering problem.
>>>
>>>
>>> The use of the assembly dependency (or a nicely named macro with
>>> the same effect) fixes the problem in this situation. It does so
>>> regardless of optimisation options - the compiler is required to
>>> have calculated the result of the division before disabling
>>> interrupts, and cannot re-order the operations. It does so
>>> without adding any extra assembly code or hindering any
>>> optimisations - it merely forces an order on operations that are
>>> to be done anyway.
>>>
>>> It has the clear disadvantage of needing extra code in the
>>> user's source. Like memory barriers, it is a way of giving the
>>> compiler extra information that cannot be expressed in normal C,
>>> and which the compiler cannot (at the moment) figure out for
>>> itself.
>>>
>>> You say that the assembly dependency does not "hit the point". I
>>> think you are correct there - it is treating the symptom, not the
>>> disease. It is not telling the compiler that an operation should
>>> not be re-ordered, or that division is a costly operation. It
>>> simply tells the compiler that we need the results of that
>>> computation /here/. But it is a very effective and efficient
>>> cure for this sort of problem. Unless and until there is some
>>> /safe/ fix in the compiler to avoid this (and I don't count "put
>>> this compiler option in your command line" as safe), I really do
>>> think it is the best we have.
>>>
>>>
>>> Note, however, that the "forceDependency" macro only solves half
>>> the problem. Consider :
>>>
>>> unsigned int test2b(void) { unsigned int val;
>>>
>>> cli(); val = ivar; sei(); val = 65535 / val; return val; }
>>>
>>> In this case, the compiler could move the division backwards
>>> above the sei(), giving a similar problem. (It did not make the
>>> move in my brief tests - but it /could/ do.) I don't know if the
>>> -fno-tree-ter flag stops that too, but the forceDependency()
>>> macro is not enough. The forgetCompilerKnowledge macro is the
>>> answer:
>>>
>>> unsigned int test2b(void) { unsigned int val;
>>>
>>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
>>> 65535 / val; return val; }
>>>
>>> This tells the compiler that it needs to stabilise the value of
>>> "val", and it can't assume anything about "val" after this point
>>> in the code, because it /might/ be read and /might/ change in the
>>> assembly code. Again, nothing is actually generated in the
>>> assembly and we are only forcing an ordering on the code.
>>>
>>>
>>> Nothing would please me better here here than to have the
>>> compiler understand that users would not want such re-ordering
>>> around cli() and sei(), so that the problem simply goes away.
>>> But it should not require particular choices of compiler flags,
>>> nor should it require disabling useful optimisations and thus
>>> generating poorer code elsewhere.
>>>
>>> It is also worth noting that though this situation occurs
>>> because division does not work like a normal function call,
>>> despite it using a library call for implementation, there is
>>> nothing fundamental to stop the compiler moving a call to foo()
>>> back or forth across a cli() or sei() as long as the compiler is
>>> sure that no memory is accessed, no volatiles are accessed, and
>>> there are no other externally visible effects in foo(). If the
>>> definition of foo() is available when compiling the code, then
>>> the compiler could well know this to be the case. If we replace
>>> "val = 65535U / val;" with "val = foo(val);", where we have :
>>>
>>> unsigned int foo(unsigned int v) { return (v * v) - v; }
>>>
>>> in the same compilation unit, some or all of the calculation from
>>> foo() will be inlined and mixed with the cli(). Again,
>>> -fno-tree-ter fixes this - at the cost of killing such mixing and
>>> optimisation in cases where it is useful. And again, the inline
>>> assembly fixes it at the cost of knowing that you have to add
>>> this line of source code.
>>>
>>> As gcc gets ever smarter with its inlining, function cloning,
>>> link-time optimisations, etc., then this will become more and
>>> more of an issue.
>>>
>>>
>>>
>>> Maybe the answer is that gcc needs an "execution barrier" that
>>> is stronger than a memory barrier, because it will also block
>>> calculations - it would act as a barrier to all local variables.
>>> I cannot think of any way to make such a barrier with inline
>>> assembly or the C11 fences - I think it would take a new
>>> __builtin for gcc. Such a feature would have use on all embedded
>>> targets, not just AVR.
>>>
>>> mvh.,
>>>
>>> David
>>>
>>>
>>
>> I totally agree with you - a feature like "execution barrier" would
>> be very useful. C11 made good job standardizing multi-threading
>> features but unfortunately the features not always fits firmware
>> development. Controlling what exactly goes into critical section is
>> a fundamental problem, so I would even go further - why don't you
>> propose the "execution barrier" as a new feature for the future C
>> language standard?
>>
>>>
>>>>
>>>> Johann
>>>>
>>>>> I don't know whether this group is the right place to post it
>>>>> however I do not know any better place. Hope someone here can
>>>>> trigger the change of the documentation and I also hope to be
>>>>> corrected if I am wrong.
>>>>>
>>>>> Thanks and regards, Marcin
>>>
>>>
>>
>>
>>
>>
>
>
>