qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announc


From: Tian, Kevin
Subject: Re: [Qemu-devel] <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date: Thu, 4 Feb 2016 03:01:36 +0000

> From: Alex Williamson [mailto:address@hidden
> Sent: Thursday, February 04, 2016 4:45 AM
> >
> > First, Jike told me before his vacation, that we cannot do any change to
> > KVM module according to community comments. Now I think it's not true.
> > We can do necessary changes, as long as it is done in a structural/layered
> > approach, w/o hard assumption on KVMGT as the only user. That's the
> > guideline we need to obey. :-)
> 
> We certainly need to separate the functionality that you're trying to
> enable from the more pure concept of vfio.  vfio is a userspace driver
> interfaces, not a userspace driver interface for KVM-based virtual
> machines.  Maybe it's more of a gimmick that we can assign PCI devices
> to QEMU tcg VMs, but that's really just the proof of concept for more
> useful capabilities, like supporting DPDK applications.  So, I
> begrudgingly agree that structured/layered interactions are acceptable,
> but consider what use cases may be excluded by doing so.

Understand. We shouldn't assume VFIO always connected to KVM. For 
example, once we have vfio-vgpu ready, it can be used to drive container
usage too, not exactly always connecting with KVM/Qemu. Actually thinking
more from this angle there is a new open which I'll describe in the end...

> >
> > 4) Map/unmap guest memory
> > --
> > It's there for KVM.
> 
> Map and unmap for who?  For the vGPU or for the VM?  It seems like we
> know how to map guest memory for the vGPU without KVM, but that's
> covered in 7), so I'm not entirely sure what this is specifying.

Map guest memory for emulation purpose in vGPU device model, e.g. to r/w
guest GPU page table, command buffer, etc. It's the basic requirement as
we see in any device model.

7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HPA
can be used to implement this interface for KVM. However for Xen it's
different, as special foreign domain mapping hypercall is involved which is
Xen specific so not appropriate to be in VFIO. 

That's why we list this interface separately as a key requirement (though
it's obvious for KVM)

> 
> > 5) Pin/unpin guest memory
> > --
> > IGD or any PCI passthru should have same requirement. So we should be
> > able to leverage existing code in VFIO. The only tricky thing (Jike may
> > elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> > which requires some further change in KVM side. But I'm not sure whether
> > it still holds true after some design changes made in this thread. So I'll
> > leave to Jike to further comment.
> 
> PCI assignment requires pinning all of guest memory, I would think that
> IGD would only need to pin selective memory, so is this simply stating
> that both have the need to pin memory, not that they'll do it to the
> same extent?

For simplicity let's first pin all memory, while taking selective pinning as a
future enhancement.

The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
EPT entry too (only pin host page tables for Qemu process). There are 
various places where EPT entries might be invalidated when guest is 
running, while KVMGT requires EPT entries to be pinned too. Let's wait 
for Jike to elaborate whether this part is still required today.

> 
> > 6) Write-protect a guest memory page
> > --
> > The primary purpose is for GPU page table shadowing. We need to track
> > modifications on guest GPU page table, so shadow part can be synchronized
> > accordingly. Just think about CPU page table shadowing. And old example
> > as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
> > not necessary now.
> >
> > So we need KVM to provide an interface so some agents can request such
> > write-protection action (not just for KVMGT. could be for other tracking
> > usages). Guangrong has been working on a general page tracking mechanism,
> > upon which write-protection can be easily built on. The review is still in
> > progress.
> 
> I have a hard time believing we don't have the mechanics to do this
> outside of KVM.  We should be able to write protect user pages from the
> kernel, this is how copy-on-write generally works.  So it seems like we
> should be able to apply those same mechanics to our userspace process,
> which just happens to be a KVM VM.  I'm hoping that Paolo might have
> some ideas how to make this work or maybe Intel has some virtual memory
> experts that can point us in the right direction.

What we want to write-protect, is when the access happens inside VM.
I don't know why any tricks in host page table can help here. The only
way is to tweak page tables used in non-root mode (either EPT or
shadow page table).

> 
> Thanks for your summary, Kevin.  It does seem like there are only a few
> outstanding issues which should be manageable and hopefully the overall
> approach is cleaner for QEMU, management tools, and provides a more
> consistent user interface as well.  If we can translate the solution to
> Xen, that's even better.  Thanks,
> 

Here is the main open in my head, after thinking about the role of VFIO:

For above 7 services required by vGPU device model, they can fall into
two categories:

a) services to connect vGPU with VM, which are essentially what a device
driver is doing (so VFIO can fit here), including:
        1) Selectively pass-through a region to a VM
        2) Trap-and-emulate a region
        3) Inject a virtual interrupt
        5) Pin/unpin guest memory
        7) GPA->IOVA/HVA translation (as a side-effect)

b) services to support device emulation, which gonna be hypervisor 
specific, including:
        4) Map/unmap guest memory
        6) Write-protect a guest memory page

VFIO can fulfill category a), but not for b). A possible abstraction would
be in vGPU core driver, to allow specific hypervisor registering callbacks
for category b) (which means a KVMGT specific file say KVM-vGPU will 
be added to KVM to connect both together).

Then a likely layered blocks would be like:

VFIO-vGPU  <--------->  vGPU Core  <-------------> KVMGT-vGPU
                        ^       ^
                        |       |
                        |       |
                        v       v
                      nvidia   intel
                       vGPU    vGPU

Xen will register its own vGPU bus driver (not using VFIO today) and
also hypervisor services using the same framework. With this design,
everything is abstracted/registered through vGPU core driver, instead
of talking with each other directly.

Thoughts?

P.S. from the description of above requirements, the whole framework
might be also extended to cover any device type using same mediated
pass-through approach. Though graphics has some special requirement,
the majority are actually device agnostics. Maybe better not limiting it
with a vGPU name at all. :-)

Thanks
Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]