Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】

From:	Bob Chen
Subject:	Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】
Date:	Thu, 30 Nov 2017 16:06:54 +0800

Hi,

After 3 months of work and investigation, and tedious mail discussions with
Nvidia, I think some progress have been made, in terms of the
GPUDirect(p2p) in virtual environment.

The only remaining issue then, is the low bidirectional bandwidth between
two sibling GPUs under the same PCIe switch.

We expanded the tests to run on even more GPU cards, so the results seemed
to be explicit now.


P40 is OK, and its hardware topology on host is:
 \-[0000:00]-+-00.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3
v4/Xeon D DMI2
             +-01.0-[03]----00.0  LSI Logic / Symbios Logic MegaRAID SAS-3
3008 [Fury]
             +-02.0-[04]----00.0  NVIDIA Corporation GP102GL [Tesla P40]
             +-03.0-[02]----00.0  NVIDIA Corporation GP102GL [Tesla P40]


M60, not OK, low bandwidth:
 \-[0000:00]-+-00.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3
v4/Xeon D DMI2
             +-01.0-[06]----00.0  LSI Logic / Symbios Logic MegaRAID SAS-3
3008 [Fury]
             +-02.0-[07-0a]----00.0-[08-0a]--+-08.0-[09]----00.0  NVIDIA
Corporation GM204GL [Tesla M60]
             |                               \-10.0-[0a]----00.0  NVIDIA
Corporation GM204GL [Tesla M60]


V100, not OK, low bandwidth:
\-[0000:00]-+-00.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon
D DMI2
             +-01.0-[01]--+-00.0  Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]
             |            \-00.1  Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]
             +-02.0-[02-05]----00.0-[03-05]--+-08.0-[04]----00.0  NVIDIA
Corporation GV100 [Tesla V100 PCIe]
             |                               \-10.0-[05]----00.0  NVIDIA
Corporation GV100 [Tesla V100 PCIe]



So what might be the actual effect of the PLX switch hardware for GPU data
flow? Although it is not visible in guest OS.
Nvidia tech-support guys are not familiar with virtualization. They asked
us to consult the community first.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】, Bob Chen <=

Prev by Date: Re: [Qemu-devel] [PATCH v4 03/20] file-posix: Switch to .bdrv_co_block_status()
Next by Date: Re: [Qemu-devel] device_del fail
Previous by thread: Re: [Qemu-devel] [PATCH v4 03/20] file-posix: Switch to .bdrv_co_block_status()
Next by thread: Re: [Qemu-devel] [PATCH v4 04/20] gluster: Switch to .bdrv_co_block_status()
Index(es):
- Date
- Thread