qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] Fix device listener interface for PCI to PC


From: Don Slutz
Subject: Re: [Qemu-devel] [PATCH 1/1] Fix device listener interface for PCI to PCI bridges
Date: Thu, 28 May 2015 15:09:48 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 05/28/15 08:28, Michael S. Tsirkin wrote:
On Thu, May 28, 2015 at 07:25:50AM -0400, Don Slutz wrote:
On 05/28/15 05:30, Michael S. Tsirkin wrote:
On Thu, May 28, 2015 at 04:46:37AM -0400, Don Slutz wrote:
The commit 707ff80021ccd7a68f4b3d2c44eebf87efbb41c4 assumed that a
PCI device has a static address.  This is not true for PCI devices
that are on the secondary bus of a PCI to PCI bridge.

BIOS and/or guest OS can change the secondary bus number to a new
value at any time.

When a PCI to PCI bridge bridge is reset, the secondary bus number
is set to zero.  Normally the BIOS will set it to 255 during PCI bus
scanning so that only the PCI devices on the root can be accessed
via bus 0.  Later it is set to a number between 1 and 254.

Adjust xen_map_pcidev() to not register with Xen for secondary bus
numbers 0 and 255.

Extend the device listener interface to be called when ever the
secondary bus number is set to a usable value.  This includes
a call on unrealize if the secondary bus number was valid.

Signed-off-by: Don Slutz <address@hidden>
---

Note: Right now hvmloader in Xen does not handle PCI to PCI bridges
and so SeaBIOS does not have access to PCI device(s) on secondary
buses.  How ever domU can setup the secondary bus(es) and this patch
will restore access to these PCI devices.

  hw/core/qdev.c              | 10 ++++++++++
  hw/pci/pci_bridge.c         | 30 ++++++++++++++++++++++++++++++
  include/hw/qdev-core.h      |  2 ++
  include/hw/xen/xen_common.h | 31 +++++++++++++++++++++++++------
  trace-events                |  1 +
  5 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index b0f0f84..6a514ee 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -239,6 +239,16 @@ void device_listener_unregister(DeviceListener *listener)
      QTAILQ_REMOVE(&device_listeners, listener, link);
  }
+void device_listener_realize(DeviceState *dev)
+{
+    DEVICE_LISTENER_CALL(realize, Forward, dev);
+}
+
+void device_listener_unrealize(DeviceState *dev)
+{
+    DEVICE_LISTENER_CALL(unrealize, Forward, dev);
+}
+
  static void device_realize(DeviceState *dev, Error **errp)
  {
      DeviceClass *dc = DEVICE_GET_CLASS(dev);

This looks wrong.  Devices not accessible to config cycles are still
accessible e.g. to memory or IO.  It's not the same as unrealize.

You need some other API that makes sense,
probably pci specific.

If I am understanding you correctly, you would like:

void device_listener_change_pci_bus_num(PCIDevice *d, uint8_t oldbus)
{
     DEVICE_LISTENER_CALL(change_pci_bus_num, Forward, d, oldbus);
}

I'm not sure what oldbus is but basically ok.

oldbus is the previous value that pci_bus_num(pci_dev->bus) would have returned. Passing it avoids the:

+            pci_set_byte(d->config + PCI_SECONDARY_BUS, oldbus);
+            pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
+                                pci_bridge_unrealize_sub, NULL);
+            pci_set_byte(d->config + PCI_SECONDARY_BUS, newbus);

hack.  At least xen wants to know the old value so that an "unrealize" (i.e. unmap 
in xen terms) can be done for the old value and then pci_bus_num(pci_dev->bus) can be 
done to get the new mapping.



And it must be invoked whenever bus visibility changes,
not just its number.

This is not clear to me.  Maybe Paul Durrant has a better understanding.

I look at this patch as a bug fix in that QEMU 2.2 and before "work" with pci-bridge. It is only after Paul's change that it stops working. Maybe part of what is not clear is that the new routine is called for the PCI devices on the secondary bus.

So at start using the example QEMU config (a Xen one):

/usr/lib/xen/bin/qemu-system-i386 \
 -xen-domid \
 13 \
 -chardev \
 socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-13,server,nowait \
 -no-shutdown \
 -mon \
 chardev=libxl-cmd,mode=control \
 -chardev \
 socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-13,server,nowait \
 -mon \
 chardev=libxenstat-cmd,mode=control \
 -nodefaults \
 -name \
 C63-min-tools \
 -vnc \
 127.0.0.1:0,to=99 \
 -display \
 none \
 -serial \
 pty \
 -device \
 cirrus-vga,vgamem_mb=8 \
 -boot \
 order=cda \
 -device \
 vmxnet3,id=nic0,netdev=net0,mac=00:0c:29:86:44:a0 \
 -netdev \
 type=tap,id=net0,ifname=vif13.0-emu,script=no,downscript=no \
 -device \
 e1000,id=nic1,netdev=net1,mac=00:0c:29:86:44:aa \
 -netdev \
 type=tap,id=net1,ifname=vif13.1-emu,script=no,downscript=no \
 -machine \
 xenfv \
 -monitor \
 pty \
 -boot \
 menu=on \
 -device \
 pci-bridge,chassis_nr=1,msi=on,id=pciBridge1.0,addr=0x11.0 \
 -device \
 pci-bridge,chassis_nr=2,msi=on,id=pciBridge5.0,multifunction=on,addr=0x15.0 \
 -device \
 pci-bridge,chassis_nr=3,msi=on,id=pciBridge6.0,multifunction=on,addr=0x16.0 \
 -device \
 
pci-bridge,chassis_nr=4,msi=on,id=pciBridge7.0,multifunction=on,bus=pciBridge1.0,addr=0x17.0
 \
 -device \
 pci-bridge,chassis_nr=5,msi=on,id=pciBridge8.0,multifunction=on,addr=0x18.0 \
 -device \
 pvscsi,id=scsi0,bus=pciBridge5.0,addr=0x1.0x0 \
 -drive \
 if=none,id=disk0-0,file=/dev/etherd/e500.1 \
 -device \
 scsi-disk,vendor=VMware,ver=1.0,product=Virtual \
 disk,bus=scsi0.0,scsi-id=0,drive=disk0-0 \
 -device \
 pvscsi,id=sas1,bus=pciBridge7.0,addr=0x1.0x0 \
 -drive \
 if=none,id=disk1-1,file=/dev/etherd/e500.2 \
 -device \
 scsi-disk,vendor=VMware,ver=1.0,product=Virtual \
 disk,bus=sas1.0,scsi-id=1,drive=disk1-1 \
 -device \
 pvscsi,id=scsi2,bus=pciBridge1.0,addr=0x3.0x0 \
 -drive \
 if=none,id=disk2-2,file=/dev/etherd/e500.3 \
 -device \
 scsi-disk,vendor=VMware,ver=1.0,product=Virtual \
 disk,bus=scsi2.0,scsi-id=2,drive=disk2-2 \
 -device \
 e1000,id=nic2,netdev=net2,mac=00:0c:29:86:44:b4,bus=pciBridge5.0,addr=0x3.0x0 \
 -netdev \
 type=tap,id=net2,ifname=vif.2-emu,script=/etc/qemu-ifup,downscript=no \
 -device \
 
vmxnet3,id=nic3,netdev=net3,mac=00:0c:29:86:44:be,bus=pciBridge5.0,addr=0x4.0x0 
\
 -netdev \
 type=tap,id=net3,ifname=vif.3-emu,script=/etc/qemu-ifup,downscript=no \
 -m 1016


Which under Linux (with this patch)looks like:


address@hidden ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
00:03.0 VGA compatible controller: Cirrus Logic GD 5446
00:11.0 PCI bridge: Red Hat, Inc. Device 0001
00:15.0 PCI bridge: Red Hat, Inc. Device 0001
00:16.0 PCI bridge: Red Hat, Inc. Device 0001
00:18.0 PCI bridge: Red Hat, Inc. Device 0001
01:03.0 SCSI storage controller: VMware PVSCSI SCSI Controller
01:17.0 PCI bridge: Red Hat, Inc. Device 0001
02:01.0 SCSI storage controller: VMware PVSCSI SCSI Controller
03:01.0 SCSI storage controller: VMware PVSCSI SCSI Controller
03:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
03:04.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
address@hidden ~]# lspci -t
-[0000:00]-+-00.0
           +-01.0
           +-01.1
           +-01.3
           +-02.0
           +-03.0
           +-11.0-[01-02]--+-03.0
           |               \-17.0-[02]----01.0
           +-15.0-[03]--+-01.0
           |            +-03.0
           |            \-04.0
           +-16.0-[04]--
           \-18.0-[05]--
address@hidden ~]#


The issue that I am attempting to fix is that xen is told (without this patch):


xen_map_pcidev id: 1 bdf: 00.00.00
xen_map_pcidev id: 1 bdf: 00.01.00
xen_map_pcidev id: 1 bdf: 00.01.01
xen_map_pcidev id: 1 bdf: 00.01.03
xen_map_pcidev id: 1 bdf: 00.02.00
xen_map_pcidev id: 1 bdf: 00.03.00
xen_map_pcidev id: 1 bdf: 00.04.00
xen_map_pcidev id: 1 bdf: 00.05.00
xen_map_pcidev id: 1 bdf: 00.11.00
xen_map_pcidev id: 1 bdf: 00.15.00
xen_map_pcidev id: 1 bdf: 00.16.00
xen_map_pcidev id: 1 bdf: 00.17.00
xen_map_pcidev id: 1 bdf: 00.18.00
xen_map_pcidev id: 1 bdf: 00.01.00
xen_map_pcidev id: 1 bdf: 00.01.00
xen_map_pcidev id: 1 bdf: 00.03.00
xen_map_pcidev id: 1 bdf: 00.03.00
xen_map_pcidev id: 1 bdf: 00.04.00


Now 00.17.00, 00.01.00, 00.01.00, 00.03.00, 00.03.00, and 00.04.00 are just wrong.

After the patch you get (when PCI_SECONDARY_BUS of 00.11.0 is set to 1):


xen_map_pcidev id: 1 bdf: 01.03.00
xen_map_pcidev id: 1 bdf: 01.17.00


and then (when PCI_SECONDARY_BUS of 01.17.0 is set to 2):


xen_map_pcidev id: 1 bdf: 02.01.00


and then (when PCI_SECONDARY_BUS of 00.15.0 is set to 3):


xen_map_pcidev id: 1 bdf: 03.01.00
xen_map_pcidev id: 1 bdf: 03.03.00
xen_map_pcidev id: 1 bdf: 03.04.00


Which fixes the bug I ran into. Did you want me to speed the time to open a QEMU bug?




diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 40c97b1..042680d 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -241,6 +241,18 @@ void pci_bridge_update_mappings(PCIBridge *br)
      pci_bridge_region_cleanup(br, w);
  }
+static void pci_bridge_realize_sub(PCIBus *b, PCIDevice *d,
+                                   void *opaque)
+{
+    device_listener_realize(DEVICE(d));
+}
+
+static void pci_bridge_unrealize_sub(PCIBus *b, PCIDevice *d,
+                                     void *opaque)
+{
+    device_listener_unrealize(DEVICE(d));
+}
+
  /* default write_config function for PCI-to-PCI bridge */
  void pci_bridge_write_config(PCIDevice *d,
                               uint32_t address, uint32_t val, int len)
@@ -248,6 +260,8 @@ void pci_bridge_write_config(PCIDevice *d,
      PCIBridge *s = PCI_BRIDGE(d);
      uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
      uint16_t newctl;
+    uint8_t oldbus = pci_get_byte(d->config + PCI_SECONDARY_BUS);
+    uint8_t newbus;
pci_default_write_config(d, address, val, len); @@ -265,6 +279,22 @@ void pci_bridge_write_config(PCIDevice *d,
          pci_bridge_update_mappings(s);
      }
+ newbus = pci_get_byte(d->config + PCI_SECONDARY_BUS);
+    if (newbus != oldbus) {
+        PCIBus *sec_bus = pci_bridge_get_sec_bus(s);
+
+        if (oldbus && oldbus != 255) {
+            pci_set_byte(d->config + PCI_SECONDARY_BUS, oldbus);
+            pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
+                                pci_bridge_unrealize_sub, NULL);
+            pci_set_byte(d->config + PCI_SECONDARY_BUS, newbus);
+        }
+        if (newbus && newbus != 255) {
+            pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
+                                pci_bridge_realize_sub, NULL);
+        }
+    }
+

This is relying on undocumented assumptions and how specific firmware
works. There's nothing special about bus number 255, and 0 is not very
special either (except it happens to be the reset value).

Ok, using the above it would change to (almost):


if (newbus != oldbus) {
     pci_for_each_device(pci_bridge_get_sec_bus(s),
                         pci_bus_num(sec_bus),
                         device_listener_change_pci_bus_num,
                         oldbus);
}
Not really because it's not just secondary bus number.
Changing subordinate bus numbers can hide/unhide whole buses.


You are right. I have no idea what Paul Durrant was thinking about this case.
And this would apply to PCI_SUBORDINATE_BUS not PCI_SECONDARY_BUS.

Since at QEMU 2.2 Xen sends all pci-config requests to QEMU, things "worked". It is not clear to me that the complexity of tracking bus visibility make sense. Clearly you do.



Would it be better to have:

void device_listener_change_pci_bus_num(PCIBus *b, PCIDevice *d, void
*opaque)
{
     uint8_t oldbus = (uint8_t)opaque;
     DEVICE_LISTENER_CALL(change_pci_bus_num, Forward, d, oldbus);
}

So that the above works, or to add a function to convert args?

To know whether device is accessible to config cycles, you
really need to scan the hierarchy from root downwards.

Yes, that is correct.  However what I am doing here is not
changing how QEMU checks if the device is accessible, but
changing what pci config Xen sends to QEMU.  If the change
to PCI_SECONDARY_BUS hides this PCI to PCI bridge, that is
not an issue.


    -Don Slutz


Imagine a bridge with secondary bus number 5
behind another one with subordinate numbers 1-3.
You should not send conf cycles for bus number 5 to qemu.


That is correct. How ever unless Paul Durrant has an example of more then 1 "QEMU" where this would make a difference, the cases I am aware of are:

1) Xen does not send it, and returns 0xffffffff (or smaller).

2) QEMU returns 0xffffffff (or smaller).

I will grant that #1 is faster, but it also is only happening during start up and so I do not see the clear win to add more complex code to only do #1.

   -Don Slutz




reply via email to

[Prev in Thread] Current Thread [Next in Thread]