avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] avr superoptimizer


From: Sean D'Epagnier
Subject: Re: [avr-gcc-list] avr superoptimizer
Date: Tue, 21 Apr 2009 16:58:46 -0600

hi,

>
> gcc knows two kinds of peepholes: RTL->RTL (peephole2) and RTL->asm
> (peephole). How would you integrate your work into gcc? You would have
> to map asm to RTL in order express your rules in terms of RTL. Moreover,
> this mapping is far from being unique: it's pointless to generate
> peepholes whose input is never generated by gcc. How could this missing
> link be established?
>

If I can generate the RTL needed for multiple machines based only on
the assembly, I will know what gcc is looking for.  Also I can insert
some kind of profiling or counter into the rtl and later remove all
the rules gcc never uses.

>> Also, that peephole only works for 32bit numbers correct?  What if
>> there happen to be 2 16 bit ones?  Or even 4 8bit numbers that happen
>> to be able to benefit from this. Also what if you want to load 0x3bd3
>> into the upper and lower half using ldi, ldi, movw?  Currently gcc
>> just does 4 ldis
>
> This is no peephole. It is the routine that prints 32-bit mode regs that
> load zero. Splitting wide types knocks out this function, of course,
> because that tries to break down SImode to HImode/QImode. I cannot say
> what happens for 2*HI and if gcc can be driven to reuse regs whose value
> is known by adapting ome cost functions. In fact, I supplied some
> patches that will reuse reg contents on SI operations. This can be done
> for AND, IOR, XOR, CMP, MOV, etc.
> However, the problem is not to write the optimizations. The problem is
> that the patches are rotting somewhere in the web because I didn't get a
> copyright assignment. So no one will ever integrate them in gcc even if
> they work.
>

Please send me the patch so I can look at it if you still have it around.

>> I would specify them as generated and keep them in a separate file.
>
> Perhaps the new gcc plugin framework will open a door? So that the
> functionality could be supplied in a plugin, without all the license
> hick-hack that is implied when gcc sources are to be changed...
>

Could be..  I already have a copyright assignment though.

>>> -- I think before adding peepholes we should try to fix the very
>>> problems: maybe missing combine patterns, playing around with command
>>> line options, smarter ways to printout assembler, maybe costs, insn
>>> constraints, see if the bad code still persist in gcc 4.5, etc.
>>
>> Yes I agree, it is better to handcode a few patterns to take care of
>> 90% of the cases than to have a few hundred generated cases.
>>
>>> As I said, IMHO peep2 should be a last resort to fix mess if more
>>> sophisticated and more general approaches fail. I guess a bunch of the
>>> cases you see and treat is just because gcc doesn't handle what it could
>>> have handled if it was described somewhere.
>>
>> Yes, and it's annoying the way gcc is 32bit centric so it means all
>> the patterns have to be duplicated for 8, 16, and 32 bits on avr.
>>
>> Maybe I'm getting carried away, but Ideally gcc would figure out how
>> to add 32bit numbers if it knows how to add 8 bit ones.. it should be
>> able to generate multiplication and division routines using it's
>> knowledge of the assembly language.. and then it would be trivial to
>> support 24bit integers, or fixed point types of any size for any
>> target (It was a pain writing all the multiplication and division
>> routines for 8 different types of fixed point numbers)  If it could
>> do this type of thing, it would significantly speed up writing new
>> backends since you would only need to define the instruction set to
>> the compiler, not rtl to the instruction set.
>
> There are -- or at least there were -- projects that try to generate a
> cross compiler (resp. machine description to feed a cross compiler)
> directly from the hardware description/specification. But that is still
> a field of science and research.
>

Fine with me.. I like science projects.

> As far as gcc is concerned, it actually *does* expand SImode(32 bits) by
> means of smaller modes like HImode(16 bits) and/or QImode(8 bits) if you
> do not suply SImode/remove SImode from the backend. Note that word_mode
> is HImode resp QImode (depends on -mint8) and Pmode is HImode, too.
>
> However, the code is far from what would please your eyes. Yust have a
> look how avr-gcc breaks down 64-bit numbers (DImode) to smaller modes.
> This looks fine for moves and maybe bit operations. But addition is
> horror because there is nothing like an add-with-carry or
> compare-with-carry: The carry will be computed explicitely by means of
> shifts, etc, i.e. extremely expensive. This means that the best way to
> support SImode in gcc is to actually provide this mode -- and this means
> you have to supply it /completely/ and rewrite the difficult parts,
> especially the move instruction which is most painful.
>

gcc needs a description for add, and for add with carry.  Then it can
combine them to for addition for any multiple of 8 bits on avr.  I
noticed how it did it for 64bit types, and I thought about supporting
them directly (the same as HI and SI modes).  For fixed point, it
doesn't even try, it just calls a c function if you don't implement
it.

> Some backend describe add-carry explicitely on RTL level. However, the
> RTL optimizers don't know anything about it, and things will remain this
> way exept some carry-magic is integrated in gcc. That is not trivial and
> a major change with new standars insns, optimization passes and
> low-level stuff that's far from being trivial.

Ok.. maybe it gets confusing with the vector operations too.. and it
should do much more than add with carry, also should work for
subtraction and multiplication, and other operations.  It would be
nice to make gcc smarter in this way so a lot of backends could be
simplified, and gcc could use better optimizations as well.

Sean




reply via email to

[Prev in Thread] Current Thread [Next in Thread]