avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] RFC: Speeding up small ISRs: PR20296


From: Georg-Johann Lay
Subject: Re: [avr-gcc-list] RFC: Speeding up small ISRs: PR20296
Date: Sat, 17 Jun 2017 15:41:01 +0200
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Erik Christiansen schrieb:
Reply-To: address@hidden

On 15.06.17 14:43, Georg-Johann Lay wrote:
https://gcc.gnu.org/PR20296

is about speeding up "small" ISRs, and is open for 12 years now...

Anyone familiar with avr-gcc knows that a fix would be high effort and risk,
and that's the major reason for why PR20296 is still open (and even
classified "suspended").

In some forum discussion (again!) on that issue, there was the following
proposal to approach that PR:

Reading the PR causes me to infer that moving that code generation out
of gcc into gas is the proposed fix for unmanageable optimisation
complexity in gcc in the use case.

Allow me not to go into GCC details how...

1) Let GCC emit directives / pseudo-instructions in non-naked ISR prologue /
epilogue

If only existing directives and macro invocations are emitted, then the
need to modify gas code is obviated. I.e. if:

  .maybe_isr_prologue 123       ; were instead:
   maybe_isr_prologue 123

What I meant is a pseudo instruction. Something that's handled like an instruction but not available in hardware, indicated by a leading ".". If you don't like that "." just drop it, it's just sugar.

then a gas macro could generate the required code without unnecessary
complexity. If the desired code to be generated from the parameters

Can you make this explicit? How can a macro know whether the following code, hundreds of insns, clobbers tmp_reg or SREG?

Macro expansion runs prior to instruction parse.

supplied can be described, then I'll write the macro(s). After some
iterations, we should have some good results for some useful use cases.

Maybe I don't get your point.  Can you provide such a macro?

2) Let GAS scan the code and replace the directives with code as needed.
Currently,

#include <avr/io.h>
#include <avr/interrupt.h>

ISR (INT0_vect)
{
    __asm ("; Code");
}

emit something like:


__vector_1:
        push r1
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__
.L__stack_usage = 3

        ; Code

        pop r0
        out __SREG__,r0
        pop r0
        pop r1
        reti


which would change to:


__vector_1:
        .maybe_isr_prologue 123
        ;; Rest of prologue

        ; Code

        ;; Rest of epilogue
        .maybe_isr_epilogue 123
        reti

GAS would then scan the code associated to the function and replace the
.maybe by appropriate sequence to safe / init / restore tmp-reg, zero-reg
and SREG.

That sets things up handily for finishing by simple macro. Let us say
that gcc emits "_maybe_isr_prologue 1 2 3", then 1 could be the switch
for save, 2 for init, and 3 for restore, if desired. Gas macros readily
handle omission of the last parameter (with it then taking an internally
defined default value), which can be useful if gas knows the default,
and gcc doesn't. Lumping it all into a single parameter would lead to 8
parameter values, just to cover 3 binary switches, IIUC the use case.

The "123" was just intended as tag (in the case that's helpful for GAS). To be more precise, consider

;; Start of ISR1
__vector_1:
        .maybe_isr_prologue 123
        ;; Rest of prologue
        ; CodeA
        ;; Rest of epilogue
        .maybe_isr_epilogue 123
        reti
        ; CodeB
        ;; Rest of epilogue
        .maybe_isr_epilogue 123
        reti
        ; CodeC
;; End of ISR1

If CodeA+CodeB+CodeC clobber SREG, Rzero, Rtmp then .maybe_isr_prologue shall expand to

        push r1
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__

If CodeA+CodeB+CodeC clobber SREG, Rzero but not Rtmp then .maybe_isr_prologue shall expand to

        push r1
        in r1,__SREG__
        push r1
        clr __zero_reg__

If CodeA+CodeB+CodeC clobber SREG but neither Rzero nor Rtmp then .maybe_isr_prologue shall expand to (this might be optimized further)

        push r0
        in r0,__SREG__
        push r0

If CodeA+CodeB+CodeC clobber Rtmp but neither Rzero nor SREG then .maybe_isr_prologue shall expand to

        push r0

If CodeA+CodeB+CodeC clobber neither Rtmp nor Rzero nor SREG then .maybe_isr_prologue shall expand to

        ;; Empty

etc. and with epilogues (2 in this case) accordingly.


Other registers like R24 are handled by GCC as usual.  For example, if
the scan reveals that tmp-reg is not needed but zero-reg is (which
will imply SREG due to the CLR) the replacement code would be:

__vector_1:
        push r1
        in r1,__SREG__
        push r1
        clr __zero_reg__

        ; Code

        pop r1
        out __SREG__,r1
        pop r1
        reti

An epilogue macro can be made to know whether its matching prologue
saved tmp-reg, even if that is stretching assembler macros slightly.
That would not require any additional code scan. So long as nesting ISRs

Can you give an example? In particular, how the prologue macro gains knowledge about whether tmp-reg needs to be handled or not, i.e. how it draws that information from "Code"?

is illegal, then it would not even clutter the gas symbol table
perceptibly.

ISR nesting is no issue. ISR nesting occurs dynamically, not during statically.

Maybe someone is interested in implementing the GAS part, and if that is the
case and the proposal is feasible, I would take care of the GCC part.

I propose that we minimise toolchain modification by choosing an elegant
implementation, based on existing gas capabilities, if feasible. Thus
far, I have not seen any proposed code generation which ought not be
achievable that way.

I've seen such examples, e.g. working around specific kind of silicon bugs might be considerably less effort in GAS compared to GCC.

Caveats:

a) .L__stack_usage can no more be computed by GCC

It is no effort for gas to implement lines like: .L__stack_usage = 3
As we know exactly how many bytes we are adding to the stack frame, we
can effortlessly dimension and emit that line - and yes, it is gas which
converts that 'L' into a unique integer. Whatever code later uses all the
.nnnn__stack_usage sizes should continue to work as before.

b) It's hard to find the end of the relevant code.  We might have
interleaved sections (like with dispatch tables), there might be code that
is emit after the epilogue, there might be more than 1 epilogue,

If gcc doesn't know what it is doing, then gas can't fix that part. ;-)

GAS can fix it in principle, and it would be orders of magnitude less work than in GCC. That's the point of the proposal.

dunno if GAS can infer whether JMP is local or non-local.

If it's one of its recognised local labels. Macro-local symbols may be
defined by a "LOCAL" directive, and local scope symbols by use of '@' or
concatenating a parameter suffix to part-label. There are also the

I am not talking about assembly input. I am talking about GAS internals.

numbered local labels. It knows and uses local symbols beginning with
'L', and omits them from the symbol table, IIRC.

If the JMP destination is external, then ld will handle the linking.
That's well outside the remit of gas. If any relaxation is hoped for,
then that will be also provided by ld, if available.

GAS might also relax, and any pseudo-instruction expansion must not occur after GAS relaxing. IIUC any GAS relaxing is forwarded to LD since --mlink-relax, hence that option would have to be forced to be always on provided pseudo expansion would have to be run after GAS relaxing.

The .maybe gets function-unique identifiers (123 in the example) so that GAS
knows which epilogue belongs to which .prologue provided that's helpful.

That greatly simplifies the work to be done in gas. Now linking prologue
with epilogue is effortless. But as an ISR will be contained in one
compile unit, and no other ISR will sanely be nested, is that required?

As said, ISR nesting doesn't occur statically.

I am not familiar with Binutils / GAS though and don't know if it's easy to
add the 2 new passes: One to scan and one to replace the .maybe with
appropriate code.

Neither appear necessary. Gas makes the substitutions as is.

hä? Would you make that explicit? Like a macro that catches all 5 situations I outlined above, for any CodeA+CodeB+CodeC ?


IIUC GAS only works on sections,

Sorry, no, it only works on compile units, i.e a source file and its
includes. It knows little else. Its relationship to sections is that
understands .section directives to the extent that it will put code into
whichever section is named in one. It even has a section stack, so that
it can pop back to the prior section after an excursion into another.

Ya, I know. My questions however are not re. the GAS front, but re. internal working and representation, passes etc.

and the scan would be on BFD internal representation (like relaxing)
after the parser read in the asm sources?

Gas takes assembler source code text as input, and generates an ELF
relocatable object file. It is only ld which can perform relaxation.

That's GAS as a black box. Internally, they are using BFD, no? (Except macro expansion and parse input which are strings of course).

The GCC change would add a new option, configure test whether GAS supports
this, let ISR prologue and epilogue emit new unspec_volatile pseudo insns

Hmmm, "unspec_volatile" doesn't appear in the described use case. If it

unspec_volatile is a GCC internal device. You'll never see it except with internal dumps like with -fdump-rtl-all.

and add a scan pass to detect situations that are pointless should fall back
to old code, like when dispatch tables, calls or non-local goto is seen.

The examples provided are within existing gas capability. Let's go with

Okay. I missed that.  I just don't see how to do it.

Johann




reply via email to

[Prev in Thread] Current Thread [Next in Thread]