qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】


From: Bob Chen
Subject: Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】
Date: Tue, 29 Aug 2017 18:41:44 +0800

The topology is already having all GPUs directly attached to root bus 0. In
this situation you can't see the LnkSta attribute in any capabilities.

The other way of using emulated switch would somehow show this attribute,
at 8 GT/s, although the real bandwidth is low as usual.

2017-08-23 2:06 GMT+08:00 Michael S. Tsirkin <address@hidden>:

> On Tue, Aug 22, 2017 at 10:56:59AM -0600, Alex Williamson wrote:
> > On Tue, 22 Aug 2017 15:04:55 +0800
> > Bob Chen <address@hidden> wrote:
> >
> > > Hi,
> > >
> > > I got a spec from Nvidia which illustrates how to enable GPU p2p in
> > > virtualization environment. (See attached)
> >
> > Neat, looks like we should implement a new QEMU vfio-pci option,
> > something like nvidia-gpudirect-p2p-id=.  I don't think I'd want to
> > code the policy of where to enable it into QEMU or the kernel, so we'd
> > push it up to management layers or users to decide.
> >
> > > The key is to append the legacy pci capabilities list when setting up
> the
> > > hypervisor, with a Nvidia customized capability config.
> > >
> > > I added some hack in hw/vfio/pci.c and managed to implement that.
> > >
> > > Then I found the GPU was able to recognize its peer, and the latency
> has
> > > dropped. ✅
> > >
> > > However the bandwidth didn't improve, but decreased instead. ❌
> > >
> > > Any suggestions?
> >
> > What's the VM topology?  I've found that in a Q35 configuration with
> > GPUs downstream of an emulated root port, the NVIDIA driver in the
> > guest will downshift the physical link rate to 2.5GT/s and never
> > increase it back to 8GT/s.  I believe this is because the virtual
> > downstream port only advertises Gen1 link speeds.
>
>
> Fixing that would be nice, and it's great that you now actually have a
> reproducer that can be used to test it properly.
>
> Exposing higher link speeds is a bit of work since there are now all
> kind of corner cases to cover as guests may play with link speeds and we
> must pretend we change it accordingly.  An especially interesting
> question is what to do with the assigned device when guest tries to play
> with port link speed. It's kind of similar to AER in that respect.
>
> I guess we can just ignore it for starters.
>
> >  If the GPUs are on
> > the root complex (ie. pcie.0) the physical link will run at 2.5GT/s
> > when the GPU is idle and upshift to 8GT/s under load.  This also
> > happens if the GPU is exposed in a conventional PCI topology to the
> > VM.  Another interesting data point is that an older Kepler GRID card
> > does not have this issue, dynamically shifting the link speed under
> > load regardless of the VM PCI/e topology, while a new M60 using the
> > same driver experiences this problem.  I've filed a bug with NVIDIA as
> > this seems to be a regression, but it appears (untested) that the
> > hypervisor should take the approach of exposing full, up-to-date PCIe
> > link capabilities and report a link status matching the downstream
> > devices.
>
>
> > I'd suggest during your testing, watch lspci info for the GPU from the
> > host, noting the behavior of LnkSta (Link Status) to check if the
> > devices gets stuck at 2.5GT/s in your VM configuration and adjust the
> > topology until it works, likely placing the GPUs on pcie.0 for a Q35
> > based machine.  Thanks,
> >
> > Alex
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]