avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Inversion of logic improves size speed


From: Anatoly Sokolov
Subject: Re: [avr-gcc-list] Inversion of logic improves size speed
Date: Mon, 6 Aug 2007 21:01:54 +0400

Hi,

From: "Wouter van Gulik" <address@hidden>
Sent: Sunday, August 05, 2007 11:46 PM


> After some testing I found out that inverting shift and and
> instruction can significantly reduce speed and size. In the first is 
> case the compiler misses that it can optimise the shifts for bit 4..7 
> by first nibble swapping. Which it does figure out when rewriting the 
> part as in the lower part.
> 
> Is this a (known?) bug or am I missing something?
> 

 Yes:

Bug #11259 [avr] gcc Double 'andi' missed optimization:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11259

Bug #29560 Poor optimization for character shifts on Atmel AVR:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560


Testcase:

unsigned char 
getBit4InvShift(unsigned char  temp) 
{ 
  unsigned char r = 0; 
  if((temp>>4)&1) r|=0x1; 
  return r; 
}

This code is compiled in insns:

/* frame size = 0 */
.LM2:
 ; (insn 6 3 28 demo.c:4 (set (reg:QI 24 r24 [44])
 ;         (lshiftrt:QI (reg:QI 24 r24 [ temp ])
 ;             (const_int 4 [0x4]))) 66 {lshrqi3} (nil))
        swap r24         ;  6   lshrqi3/5       [length = 2]
        andi r24,0x0f
.LVL1:
.LM3:
 ; (insn 12 7 18 demo.c:8 (set (reg/i:QI 24 r24 [ <result> ])
 ;         (and:QI (reg:QI 24 r24 [44])
 ;             (const_int 1 [0x1]))) 41 {andqi3} (nil))
        andi r24,lo8(1)  ;  12  andqi3/2        [length = 1]
/* epilogue start */


The  lshrqi3 patterns defined as opaque "macro" sequences, an 'andi'
instruction from lshrqi3 insn (#6) is never exposed to GCC's RTL optimizers. 

I try implemented 'lshrqi3' insn for "r >> 4" as 'define_insn_and_split
"*lshrqi3_const4"':

(define_insn "rotlqi3"
  [(set (match_operand:QI 0 "register_operand" "=r")
        (rotate:QI (match_operand:QI 1 "register_operand" "0")
                   (const_int 4)))]
  ""
  "swap %0"
  [(set_attr "length" "1")
   (set_attr "cc" "none")])

;; >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>
;; logical shift right

(define_expand "lshrqi3"
  [(set (match_operand:QI 0 "register_operand"             "=r,r,r,r,!d,r,r")
        (lshiftrt:QI (match_operand:QI 1 "register_operand" "0,0,0,0,0,0,0")
                     (match_operand:QI 2 "general_operand"
"r,L,P,K,n,n,Qm")))]
  ""
  "")

(define_insn_and_split "*lshrqi3_const4"
  [(set (match_operand:QI 0 "d_register_operand"             "=d")
        (lshiftrt:QI (match_operand:QI 1 "d_register_operand" "0")
                     (const_int 4)))]
  ""
  "#"
  ""
  [(set (match_dup 0) (rotate:QI (match_dup 0) (const_int 4)))
   (set (match_dup 0) (and:QI (match_dup 0) (const_int 15)))]
  "")

(define_insn "*lshrqi3"
  [(set (match_operand:QI 0 "register_operand"             "=r,r,r,r,!d,r,r")
        (lshiftrt:QI (match_operand:QI 1 "register_operand" "0,0,0,0,0,0,0")
                     (match_operand:QI 2 "general_operand"
"r,L,P,K,n,n,Qm")))]
  ""
  "* return lshrqi3_out (insn, operands, NULL);"
  [(set_attr "length" "5,0,1,2,4,6,9")
   (set_attr "cc" "clobber,none,set_czn,set_czn,set_czn,set_czn,clobber")])


As result, next code now generate:

.LM2:
 ; (insn 23 3 30 demo.c:4 (set (reg:QI 24 r24 [44])
 ;         (rotate:QI (reg:QI 24 r24 [44])
 ;             (const_int 4 [0x4]))) 66 {rotlqi3} (nil))
        swap r24         ;  23  rotlqi3 [length = 1]
.LVL1:
 ; (insn 24 30 7 demo.c:4 (set (reg:QI 24 r24 [44])
 ;         (and:QI (reg:QI 24 r24 [44])
 ;             (const_int 15 [0xf]))) 41 {andqi3} (nil))
        andi r24,lo8(15)         ;  24  andqi3/2        [length = 1]
.LM3:
 ; (insn 12 7 18 demo.c:8 (set (reg/i:QI 24 r24 [ <result> ])
 ;         (and:QI (reg:QI 24 r24 [44])
 ;             (const_int 1 [0x1]))) 41 {andqi3} (nil))
        andi r24,lo8(1)  ;  12  andqi3/2        [length = 1]


  There are two 'and' insn (#24 and #12), but them are not optimized yet. Why?
Probably reason, 'lshiftrt' insn is splited in 'rotate' and 'and' insns in
'pass_split_after_reload' pass of the compiler, but optimization passes
(combine and cse) of which two 'and' insns can merge are run earlier.

It is possible to add peephole for merge two 'and' insns. But I do not think
that this decision optimum.

Mine...

Anatoly.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]