avr-libc-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-libc-dev] User-manual/optimization.html


From: David Brown
Subject: Re: [avr-libc-dev] User-manual/optimization.html
Date: Fri, 19 Jun 2015 17:41:24 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

On 19/06/15 17:13, Georg-Johann Lay wrote:
> Am 06/18/2015 um 02:58 PM schrieb David Brown:
>> Hi,
>>
>> In the user manual:
>>
>> <http://www.nongnu.org/avr-libc/user-manual/optimization.html>
>>
>> there is a discussion about the unexpected code generation from:
>>
>> #define cli() __asm volatile( "cli" ::: "memory" )
>> #define sei() __asm volatile( "sei" ::: "memory" )
>> unsigned int ivar;
>> void test2( unsigned int val )
>> {
>>     val = 65535U / val;
>>     cli();
>>     ivar = val;
>>     sei();
>> }
>>
>> This came up recently in a gcc-help mailing list question - the problem
>> is that the call to __udivmodhi4 may be generated after the cli
>> instruction, disabling interrupts for longer than necessary.  The web
>> page says there is no way to force the desired code generation (with
>> "val" being calculated before "cli").
> 
> If my recollection is right -fno-tree-ter was a fix as the code motion
> was performed by respective pass.
> 
> Some technical background:  The avr back-end pretends it implements
> integer division and remainder by providing respective insns, hence the
> middle-end assumes that the division can be performed with a few
> instructions.
> 
> Rationale is that avr-libgcc has many hand-written and -optimized
> assembler routines, and many of these routines have a smaller register
> footprint than required by the ABI.  avr-gcc uses this information to
> implement respective features (like div) as a transparent library call
> together with clobbering all destroyed registers and providing arguments
> to respective registers by hand.
> 
> This results in much smaller code, and many functions become leaf
> functions. Without that approach any function using a feature as basic
> as integer multiplication would generate "proper" library calls similar
> to ordinary functions.
> 
> If division was a library call it wouldn't be moved across the memory
> clobber, but the result would considerably increase in code size.
> 
> 
>> However, there /is/ a way to get the right results - using a fake
>> assembly input to force the calculation:
>>
>> #define cli() __asm volatile( "cli" ::: "memory" )
>> #define sei() __asm volatile( "sei" ::: "memory" )
>> unsigned int ivar;
>> void test2( unsigned int val )
>> {
>>      val = 65535U / val;
>>      asm volatile("" :: "" (val));
>>      cli();
>>      ivar = val;
>>      sei();
>> }
>>
>> The memory clobber on cli() and sei() ensures that no memory operations
>> are moved before or after those statements.  But as already noted, the
>> memory clobber does not affect non-memory operations such as
>> calculations or register-only manipulation.
> 
> The problem is that one has to know respective dependencies which is
> usually not the case.  Just consider the case where the cli() is part of
> an inlined function and the division or multiplication is performed by
> the caller.  or the multiplication is part of an address computation
> like in  val = list->next->next->next->val.
> 
> 
> My recommendation is to try -fno-tree-ter before cluttering up code with
> ugly patterns.
> 

Judging by the gcc documentation, -fno-tree-ter could result in less
optimal code because it removes optimisation opportunities.  The use of
this asm statement here is ugly (no arguments there), and requires
knowledge of exactly what you want to calculate and when you want it to
be calculated - but it generates optimal code and is, I think,
guaranteed to do the calculation before the cli().  Disabling the
tree-ter optimisation path may require less knowledge of the code, but
it is not safe - different optimisation settings or future changes to
the compiler may break that "fix".

But I think the best solution would be to add information about both
these tricks to the documentation, so that users can choose the method
that makes most sense for them and the code they are writing.

Ultimately, I suppose, the real solution is to make gcc aware of the
cost of operations such as division, and ensure that high-cost
operations don't move around as much as low-cost ones (while still
keeping functions like this as special to avoid normal function call ABI
overhead).

mvh.,

David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]