qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c


From: Jamie Lokier
Subject: Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
Date: Fri, 14 Nov 2008 13:23:35 +0000
User-agent: Mutt/1.5.13 (2006-08-11)

Avi Kivity wrote:
> Jamie Lokier wrote:
> >But does the fact KVM doesn't use TCG prevent KVM from running some
> >x86 modes correctly?  E.g. I gather 16-bit code is run by KVM using
> >VM86 mode, which is not exactly correct.  It would be nice to have KVM
> >acceleration but also complete and correct emulation, by switching to
> >TCG for those modes.
> 
> There is work in progress to make 16-bit emulation fully accurate.

Ooh!  I want my Windows 95 to run in KVM :-)
I'm curious, how is this planned to work?

I'm having trouble thinking of how to do it without software emulation
at some stage.

> >Also, an earlier thread pointed out that loops doing a lot of MMIO are
> >_slower_ with KVM than without - this manifested as very slow VGA
> >output for some guests.  Having KVM pass control to TCG for short runs
> >of guest instructions which do MMIO, or other instructions which need
> >to be emulated, would accelerate KVM in this respect.

(I think VMware does something like this, btw).

> Since TCG is not smp-safe, this is very problematic for smp guests.  You 
> would have to stop virtualization on all vcpus and start tcg on all of 
> them.  Performance would plummet.

On the other hand, when running on a KVM-capable architecture
combination, it is definitely possible to make TCG smp-safe because
every guest atomic instruction has a corresponding host one.  It's
practically a 1:1 instruction mapping on x86, which doesn't have many
atomic instructions.  (Maybe harder on other archs).

> There are ways of mitigating the high mmio cost with kvm.  For 
> framebuffers, one can allow kvm direct access.  For other mmio, there's 
> the 'coalesced mmio' support which allows mmio to be batched when this 
> does not affect emulation accuracy and latency.

Don't you still have to trap for each MMIO in order to collect the
batch, except for REP instructions?  It's the traps which are expensive.

Fortunately modern hardware tends to use DMA for data intensive
things, and MMIO just to trigger DMA, and initialisation.

-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]