qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] VFIO PCIe Extended Capabilities


From: Spenser Gilliland
Subject: Re: [Qemu-devel] VFIO PCIe Extended Capabilities
Date: Tue, 19 Jul 2016 20:39:26 +0000

Hi Marcel,

>> Indeed, if a device is attached to a PCI bus it makes no sense to advertise 
>> the extended configuration space.
>> Can you please share the QEMU command line? Maybe is possible to make the 
>> device's bus PCIe in QEMU?

Changing the following should make the tweak I proposed irrelevant.

- device vfio-pci,host=03:00.0,id=hostdev0,bus=pci.2,addr=0x4
+ device vfio-pci,host=03:00.0,id=hostdev0,bus=pcie.0,addr=0x4

Here's the full command for reference.

2016-07-19 18:42:22.110+0000: starting up libvirt version: 1.2.17, package: 
13.el7 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-20-16:24:10, 
worker1.bsys.centos.org), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7_2.10.9)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin 
QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name instance-00000019 -S -machine 
pc-q35-rhel7.2.0,accel=kvm,usb=off -cpu 
Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
 -m 8192 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 -uuid 
8b6865eb-2118-430e-a7e0-2989696576b1 -smbios type=1,manufacturer=Fedora 
Project,product=OpenStack 
Nova,version=13.1.0-1.el7,serial=a1068a93-24d1-4da2-8903-f9b8307fb0d8,uuid=8b6865eb-2118-430e-a7e0-2989696576b1,family=Virtual
 Machine -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-00000019/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on 
-device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device 
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -drive 
file=/var/lib/nova/instances/8b6865eb-2118-430e-a7e0-2989696576b1/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
 -device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/var/lib/nova/instances/8b6865eb-2118-430e-a7e0-2989696576b1/disk.swap,if=none,id=drive-virtio-disk1,format=qcow2,cache=none
 -device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk1,id=virtio-disk1
 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=30 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:9c:cf:60,bus=pci.2,addr=0x1 
-chardev 
file,id=charserial0,path=/var/lib/nova/instances/8b6865eb-2118-430e-a7e0-2989696576b1/console.log
 -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 
-device isa-serial,chardev=charserial1,id=serial1 -vnc 0.0.0.0:0 -k en-us 
-device cirrus-vga,id=video0,bus=pcie.0,addr=0x1 -device 
vfio-pci,host=03:00.0,id=hostdev0,bus=pci.2,addr=0x4 -device 
virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x5 -msg timestamp=on
char device redirected to /dev/pts/3 (label charserial1)

It seems very reasonable to change the qemu command line.  However, I'm using 
OpenStack Nova to launch the instance, which then uses libvirt, which finally 
uses qemu.  So, the issues is very buried.

> I think that any instance of a q35 machine where the assigned device is
> placed on the conventional PCI bridge will create this scenario.  It's
> the default for attaching devices to a libvirt managed q35 VM AFAIK.

> Yes, I discussed with Laine from libvirt the possibility to assign
> devices to a PCIe port instead.

Yes, I see this as the issue as well. The VM is being created by libvirt with 
the following xml.  The problem is that the device is auto-assigned to the 
pci-bridge by default.

<domain type="kvm">
  <uuid>8b6865eb-2118-430e-a7e0-2989696576b1</uuid>
  <name>instance-00000019</name>
  <memory>8388608</memory>
  <vcpu>8</vcpu>
    ... <snip> ...
  <devices>
    ... <snip> ...
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address bus="0x03" domain="0x0000" function="0x0" slot="0x00"/>
      </source>
    </hostdev>
    ... <snip> ...
  </devices>
</domain>

If I do a dumpxml I get the following

<domain type="kvm">
  <uuid>8b6865eb-2118-430e-a7e0-2989696576b1</uuid>
  <name>instance-00000019</name>
  <memory>8388608</memory>
  <vcpu>8</vcpu>
    ... <snip> ...
  <devices>
    ... <snip> ...
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' 
function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' 
function='0x0'/>
    </controller>
    <interface type='bridge'>
    ... <snip> ...
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' 
function='0x0'/>
    </hostdev>
  </devices>
</domain>

If I manually change the domain as follows

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
-     <address type='pci' domain='0x0000' bus='0x02' slot='0x02' 
function='0x0'/>
+    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </hostdev>

I can reimport and the device and it attaches successfully to the pcie-root 
hub.  The problem is I need to either manually specify this in nova or update 
the behavior of libvirt to auto assign pcie devices to pcie busses.  This would 
involve reading pci configuration space to check if the device is a pcie 
devices.  Then attaching to the pcie-root, if possible.

>>>
>>> In fact, I've tried to fix this multiple times:
>>>
>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05384.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg02422.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03259.html
>>>
>>> Yet the patch remains unapplied :(
>>
>> I thought is it in already. Maybe Michael can add it as part of the hard 
>> freeze.
>> And if the patch will be applied, the tweak above wouldn't help, right Alex?
>
> The tweak Spenser suggested above should be unnecessary with my proposed
> patch applied.

> I think I finally understand this. The bus is not pcie -> we return
> from vfio_add_ext_cap without "breaking" the extended capabilities
> chain and the bare metal SR-IOV capability will be visible to guest.
> With your patch the PCI bridge will "interfere" and mask the extended
> configuration space completely.

>   Only now searching for that patch did I notice Michael's
> comment hidden at the bottom of his reply, which I assume is why it
> never got applied:
>
> https://patchwork.kernel.org/patch/8057411/
>

> I just saw it too! It seems Michael wants to cache this info
> in device instead of re-calculating it every time.

>> Anyway, the current behavior is clearly a bug, so QEMU hard freeze
>> should be irrelevant.  If anyone wants to take over the patch, feel
>> free.  Thanks,

> I suppose I can handle it, but sadly not for 2.7.
> If Spencer has some time now he can help by testing it and reviewing it 
> quickly :)

I'd be happy to help; but that patch really just more permanently breaks my 
current workaround ;-) .

Thanks,
Spenser


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]