[Qemu-devel] Expensive emulation of CPU condition flags

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Expensive emulation of CPU condition flags

From:	Shuang Zhai
Subject:	[Qemu-devel] Expensive emulation of CPU condition flags
Date:	Thu, 30 Jun 2016 18:13:56 +0000

Hi everyone.


In running an ARMv7 guest on an x86 host, we observed that a guest instruction 
affecting condition flags is often translated into 10+ host instructions. The 
reason seems to be the way that the frontend emulates the condition flags. For 
instance:


Target ARM instruction:

cmp  r9, 0x21 ;


IR instruction:

movi_i32 tmp5,$0x21

sub_i32 NF,r9,tmp5

mov_i32 ZF,NF

setcond_i32 CF,r9,tmp5,geu

xor_i32 VF,NF,r9

xor_i32 tmp7,r9,tmp5

and_i32 VF,VF,tmp7


Host x86 instruction:


sub    $0x21,%ebx

mov    %ebx,0x208(%r14)

mov    %ebx,%r12d

mov    %r12d,0x20c(%r14)

cmp    $0x21,%ebp

setae  %r13b

movzbl %r13b,%r13d

mov    %r13d,0x200(%r14)

xor    %ebp,%ebx

xor    $0x21,%ebp

and    %ebp,%ebx

mov    %ebx,0x204(%r14)


Imaging in a tight loop where a cmp instruction is used to compute the 
termination condition, this can be pretty expensive. And lazy evaluation seems 
not to help here.


We wonder if there exists any optimization, e.g., directly mapping the frontend 
flags to that of the backend? Any suggestions are appreciated.


Shuang

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Expensive emulation of CPU condition flags, Shuang Zhai <=

Prev by Date: Re: [Qemu-devel] [PATCH] pci_register_bar: cleanup
Previous by thread: [Qemu-devel] [PATCH] i.MX: split the GPT timer implementation in a per SOC definition.
Index(es):
- Date
- Thread