avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Small super-optimizer results


From: Paulo Marques
Subject: Re: [avr-gcc-list] Small super-optimizer results
Date: Thu, 22 Sep 2011 13:01:48 +0100
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

Georg-Johann Lay wrote:
> Paulo Marques schrieb:
>> Hi, all
>>
>> Some time ago I toyed with trying to build a super-optimizer for the
>> avr. The project didn't went very far, but it produced a few small bits
>> of code that I think are worth sharing.
>>
>> Basically I was trying to produce functions to multiply an 8 bit value
>> by 8 bit constants with a 16 bit result in avr families without a mul
>> instruction.
> 
> Hi Paulo,

Hi, Georg

> can your optimizer produce results for other arithmetic than
> multiplication?

Yes, with a few tweaks.

> For example to get a sequence for
> 
>    R >>>= 2
> 
> where R is an 8-bit register and >>> stands for "rotate right".
> There is a 5-insn sequence
> 
>    SWAP R
>    LSL  R
>    ADC  R,Z
>    LSL  R
>    ADC  R,Z
> 
> where Z means "a register containing 0", i.e. >>> 2 is performed as
>    <<< 4
>    <<< 1
>    <<< 1
> 
> With MUL instruction there is
> 
>    LDI  U,64
>    MUL  R,U
>    OR   R0,R1
>    MOV  R,R0
> 
> but that needs an upper register U and for avr-gcc the zero-register R1
> must be cleared:
> 
>    CLR  Z
> 
> so that this sequence is not shorter but only slower.  The question is
> if there exists a shorter sequence and maybe your tool knows some magic
> to give the answer?

The tool can not give a definite answer, because it only uses a subset
of all available instructions (and it might have bugs).

I tweaked it for the 2 bit rotation, and with the subset I usually use
it couldn't find any sequence better than yours: the best sequences used
5 instructions and 2 registers.

I then increased the subset of instructions to give it more options, but
it still could not find a smaller sequence... :(

It did found a lot of 5 instruction sequences. Here are a few examples:

MOV     R2, R
ASR     R2
ROR     R
ASR     R2
ROR     R

This also uses an extra register, but doesn't require it to be
initialized to zero. Since we have a zero register available most of the
time, this is really no advantage over your version.

ADD     R, R
ADC     R, Z
ADD     R, R
ADC     R, Z
SWAP    R

This is basically the same as yours, although the SWAP being done in the
end makes the code harder to follow ;)

I've attached the output of the optimizer. The R1 register is assumed to
have zero when starting and R0 has the value to be shifted. As you can
see many of the 81 sequences are equivalent.

If you want me to try different problems, just let me know. I'm always
curious about what kind of weird sequences the optimizer will produce :)

-- 
Paulo Marques
Software Development Department - Grupo PIE, S.A.
Phone: +351 252 290600, Fax: +351 252 290601
Web: www.grupopie.com

I have not yet begun to procrastinate
ADC R0 R0
ADC R0 R1
ADC R0 R0
ADC R0 R1
SWAP R0

ADC R0 R0
ADC R0 R1
ADD R0 R0
ADC R0 R1
SWAP R0

ADC R0 R0
ADC R0 R1
SWAP R0
ADC R0 R0
ADC R0 R1

ADC R0 R0
ADC R0 R1
SWAP R0
ADD R0 R0
ADC R0 R1

ADC R1 R0
ASR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

ADC R1 R0
ASR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

ADC R1 R0
ASR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

ADC R1 R0
LSR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

ADC R1 R0
LSR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

ADC R1 R0
LSR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

ADC R1 R0
ROR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

ADC R1 R0
ROR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

ADC R1 R0
ROR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1
.
ADD R0 R0
ADC R0 R1
ADC R0 R0
ADC R0 R1
SWAP R0

ADD R0 R0
ADC R0 R1
ADD R0 R0
ADC R0 R1
SWAP R0

ADD R0 R0
ADC R0 R1
SWAP R0
ADC R0 R0
ADC R0 R1

ADD R0 R0
ADC R0 R1
SWAP R0
ADD R0 R0
ADC R0 R1

ADD R1 R0
ASR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

ADD R1 R0
ASR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

ADD R1 R0
ASR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

ADD R1 R0
LSR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

ADD R1 R0
LSR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

ADD R1 R0
LSR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

ADD R1 R0
ROR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

ADD R1 R0
ROR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

ADD R1 R0
ROR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1
..
EOR R1 R0
ASR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

EOR R1 R0
ASR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

EOR R1 R0
ASR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

EOR R1 R0
LSR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

EOR R1 R0
LSR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

EOR R1 R0
LSR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

EOR R1 R0
ROR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

EOR R1 R0
ROR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

EOR R1 R0
ROR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

MOV R1 R0
ASR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

MOV R1 R0
ASR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

MOV R1 R0
ASR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

MOV R1 R0
LSR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

MOV R1 R0
LSR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

MOV R1 R0
LSR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

MOV R1 R0
ROR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

MOV R1 R0
ROR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

MOV R1 R0
ROR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

OR R1 R0
ASR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

OR R1 R0
ASR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

OR R1 R0
ASR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

OR R1 R0
LSR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

OR R1 R0
LSR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

OR R1 R0
LSR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1

OR R1 R0
ROR R1 R1
ROR R0 R1
ASR R1 R1
ROR R0 R1

OR R1 R0
ROR R1 R1
ROR R0 R1
LSR R1 R1
ROR R0 R1

OR R1 R0
ROR R1 R1
ROR R0 R1
ROR R1 R1
ROR R0 R1
..
LSR R0 R1
ROR R1 R1
ASR R0 R1
ROR R1 R1
ADC R0 R1

LSR R0 R1
ROR R1 R1
ASR R0 R1
ROR R1 R1
ADD R0 R1

LSR R0 R1
ROR R1 R1
ASR R0 R1
ROR R1 R1
EOR R0 R1

LSR R0 R1
ROR R1 R1
ASR R0 R1
ROR R1 R1
OR R0 R1

LSR R0 R1
ROR R1 R1
LSR R0 R1
ROR R1 R1
ADC R0 R1

LSR R0 R1
ROR R1 R1
LSR R0 R1
ROR R1 R1
ADD R0 R1

LSR R0 R1
ROR R1 R1
LSR R0 R1
ROR R1 R1
EOR R0 R1

LSR R0 R1
ROR R1 R1
LSR R0 R1
ROR R1 R1
OR R0 R1

LSR R0 R1
ROR R1 R1
ROR R0 R1
ROR R1 R1
ADC R0 R1

LSR R0 R1
ROR R1 R1
ROR R0 R1
ROR R1 R1
ADD R0 R1

LSR R0 R1
ROR R1 R1
ROR R0 R1
ROR R1 R1
EOR R0 R1

LSR R0 R1
ROR R1 R1
ROR R0 R1
ROR R1 R1
OR R0 R1
.
ROR R0 R1
ROR R1 R1
ASR R0 R1
ROR R1 R1
ADC R0 R1

ROR R0 R1
ROR R1 R1
ASR R0 R1
ROR R1 R1
ADD R0 R1

ROR R0 R1
ROR R1 R1
ASR R0 R1
ROR R1 R1
EOR R0 R1

ROR R0 R1
ROR R1 R1
ASR R0 R1
ROR R1 R1
OR R0 R1

ROR R0 R1
ROR R1 R1
LSR R0 R1
ROR R1 R1
ADC R0 R1

ROR R0 R1
ROR R1 R1
LSR R0 R1
ROR R1 R1
ADD R0 R1

ROR R0 R1
ROR R1 R1
LSR R0 R1
ROR R1 R1
EOR R0 R1

ROR R0 R1
ROR R1 R1
LSR R0 R1
ROR R1 R1
OR R0 R1

ROR R0 R1
ROR R1 R1
ROR R0 R1
ROR R1 R1
ADC R0 R1

ROR R0 R1
ROR R1 R1
ROR R0 R1
ROR R1 R1
ADD R0 R1

ROR R0 R1
ROR R1 R1
ROR R0 R1
ROR R1 R1
EOR R0 R1

ROR R0 R1
ROR R1 R1
ROR R0 R1
ROR R1 R1
OR R0 R1

SWAP R0
ADC R0 R0
ADC R0 R1
ADC R0 R0
ADC R0 R1

SWAP R0
ADC R0 R0
ADC R0 R1
ADD R0 R0
ADC R0 R1

SWAP R0
ADD R0 R0
ADC R0 R1
ADC R0 R0
ADC R0 R1

SWAP R0
ADD R0 R0
ADC R0 R1
ADD R0 R0
ADC R0 R1

reply via email to

[Prev in Thread] Current Thread [Next in Thread]