qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] Fix device listener interface for PCI to PC


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH 1/1] Fix device listener interface for PCI to PCI bridges
Date: Thu, 28 May 2015 23:03:07 +0200

On Thu, May 28, 2015 at 03:09:48PM -0400, Don Slutz wrote:
> On 05/28/15 08:28, Michael S. Tsirkin wrote:
> >On Thu, May 28, 2015 at 07:25:50AM -0400, Don Slutz wrote:
> >>On 05/28/15 05:30, Michael S. Tsirkin wrote:
> >>>On Thu, May 28, 2015 at 04:46:37AM -0400, Don Slutz wrote:
> >>>>The commit 707ff80021ccd7a68f4b3d2c44eebf87efbb41c4 assumed that a
> >>>>PCI device has a static address.  This is not true for PCI devices
> >>>>that are on the secondary bus of a PCI to PCI bridge.
> >>>>
> >>>>BIOS and/or guest OS can change the secondary bus number to a new
> >>>>value at any time.
> >>>>
> >>>>When a PCI to PCI bridge bridge is reset, the secondary bus number
> >>>>is set to zero.  Normally the BIOS will set it to 255 during PCI bus
> >>>>scanning so that only the PCI devices on the root can be accessed
> >>>>via bus 0.  Later it is set to a number between 1 and 254.
> >>>>
> >>>>Adjust xen_map_pcidev() to not register with Xen for secondary bus
> >>>>numbers 0 and 255.
> >>>>
> >>>>Extend the device listener interface to be called when ever the
> >>>>secondary bus number is set to a usable value.  This includes
> >>>>a call on unrealize if the secondary bus number was valid.
> >>>>
> >>>>Signed-off-by: Don Slutz <address@hidden>
> >>>>---
> >>>>
> >>>>Note: Right now hvmloader in Xen does not handle PCI to PCI bridges
> >>>>and so SeaBIOS does not have access to PCI device(s) on secondary
> >>>>buses.  How ever domU can setup the secondary bus(es) and this patch
> >>>>will restore access to these PCI devices.
> >>>>
> >>>>  hw/core/qdev.c              | 10 ++++++++++
> >>>>  hw/pci/pci_bridge.c         | 30 ++++++++++++++++++++++++++++++
> >>>>  include/hw/qdev-core.h      |  2 ++
> >>>>  include/hw/xen/xen_common.h | 31 +++++++++++++++++++++++++------
> >>>>  trace-events                |  1 +
> >>>>  5 files changed, 68 insertions(+), 6 deletions(-)
> >>>>
> >>>>diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> >>>>index b0f0f84..6a514ee 100644
> >>>>--- a/hw/core/qdev.c
> >>>>+++ b/hw/core/qdev.c
> >>>>@@ -239,6 +239,16 @@ void device_listener_unregister(DeviceListener 
> >>>>*listener)
> >>>>      QTAILQ_REMOVE(&device_listeners, listener, link);
> >>>>  }
> >>>>+void device_listener_realize(DeviceState *dev)
> >>>>+{
> >>>>+    DEVICE_LISTENER_CALL(realize, Forward, dev);
> >>>>+}
> >>>>+
> >>>>+void device_listener_unrealize(DeviceState *dev)
> >>>>+{
> >>>>+    DEVICE_LISTENER_CALL(unrealize, Forward, dev);
> >>>>+}
> >>>>+
> >>>>  static void device_realize(DeviceState *dev, Error **errp)
> >>>>  {
> >>>>      DeviceClass *dc = DEVICE_GET_CLASS(dev);
> >>>
> >>>This looks wrong.  Devices not accessible to config cycles are still
> >>>accessible e.g. to memory or IO.  It's not the same as unrealize.
> >>>
> >>>You need some other API that makes sense,
> >>>probably pci specific.
> >>>
> >>If I am understanding you correctly, you would like:
> >>
> >>void device_listener_change_pci_bus_num(PCIDevice *d, uint8_t oldbus)
> >>{
> >>     DEVICE_LISTENER_CALL(change_pci_bus_num, Forward, d, oldbus);
> >>}
> >>
> >I'm not sure what oldbus is but basically ok.
> 
> oldbus is the previous value that pci_bus_num(pci_dev->bus) would have
> returned.  Passing it avoids the:
> 
> +            pci_set_byte(d->config + PCI_SECONDARY_BUS, oldbus);
> +            pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> +                                pci_bridge_unrealize_sub, NULL);
> +            pci_set_byte(d->config + PCI_SECONDARY_BUS, newbus);
> 
> hack.  At least xen wants to know the old value so that an "unrealize" (i.e. 
> unmap in xen terms) can be done for the old value and then 
> pci_bus_num(pci_dev->bus) can be done to get the new mapping.
> 
> 
> 
> >And it must be invoked whenever bus visibility changes,
> >not just its number.
> 
> This is not clear to me.  Maybe Paul Durrant has a better understanding.
> 
>  I look at this patch as a bug fix in that QEMU 2.2 and before "work" with
> pci-bridge.  It is only after Paul's change that it stops working.  Maybe
> part of what is not clear is that the new routine is called for the PCI
> devices on the secondary bus.
> 
> So at start using the example QEMU config (a Xen one):
> 
> /usr/lib/xen/bin/qemu-system-i386 \
>  -xen-domid \
>  13 \
>  -chardev \
>  socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-13,server,nowait \
>  -no-shutdown \
>  -mon \
>  chardev=libxl-cmd,mode=control \
>  -chardev \
>  socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-13,server,nowait \
>  -mon \
>  chardev=libxenstat-cmd,mode=control \
>  -nodefaults \
>  -name \
>  C63-min-tools \
>  -vnc \
>  127.0.0.1:0,to=99 \
>  -display \
>  none \
>  -serial \
>  pty \
>  -device \
>  cirrus-vga,vgamem_mb=8 \
>  -boot \
>  order=cda \
>  -device \
>  vmxnet3,id=nic0,netdev=net0,mac=00:0c:29:86:44:a0 \
>  -netdev \
>  type=tap,id=net0,ifname=vif13.0-emu,script=no,downscript=no \
>  -device \
>  e1000,id=nic1,netdev=net1,mac=00:0c:29:86:44:aa \
>  -netdev \
>  type=tap,id=net1,ifname=vif13.1-emu,script=no,downscript=no \
>  -machine \
>  xenfv \
>  -monitor \
>  pty \
>  -boot \
>  menu=on \
>  -device \
>  pci-bridge,chassis_nr=1,msi=on,id=pciBridge1.0,addr=0x11.0 \
>  -device \
>  pci-bridge,chassis_nr=2,msi=on,id=pciBridge5.0,multifunction=on,addr=0x15.0 \
>  -device \
>  pci-bridge,chassis_nr=3,msi=on,id=pciBridge6.0,multifunction=on,addr=0x16.0 \
>  -device \
>  
> pci-bridge,chassis_nr=4,msi=on,id=pciBridge7.0,multifunction=on,bus=pciBridge1.0,addr=0x17.0
>  \
>  -device \
>  pci-bridge,chassis_nr=5,msi=on,id=pciBridge8.0,multifunction=on,addr=0x18.0 \
>  -device \
>  pvscsi,id=scsi0,bus=pciBridge5.0,addr=0x1.0x0 \
>  -drive \
>  if=none,id=disk0-0,file=/dev/etherd/e500.1 \
>  -device \
>  scsi-disk,vendor=VMware,ver=1.0,product=Virtual \
>  disk,bus=scsi0.0,scsi-id=0,drive=disk0-0 \
>  -device \
>  pvscsi,id=sas1,bus=pciBridge7.0,addr=0x1.0x0 \
>  -drive \
>  if=none,id=disk1-1,file=/dev/etherd/e500.2 \
>  -device \
>  scsi-disk,vendor=VMware,ver=1.0,product=Virtual \
>  disk,bus=sas1.0,scsi-id=1,drive=disk1-1 \
>  -device \
>  pvscsi,id=scsi2,bus=pciBridge1.0,addr=0x3.0x0 \
>  -drive \
>  if=none,id=disk2-2,file=/dev/etherd/e500.3 \
>  -device \
>  scsi-disk,vendor=VMware,ver=1.0,product=Virtual \
>  disk,bus=scsi2.0,scsi-id=2,drive=disk2-2 \
>  -device \
>  
> e1000,id=nic2,netdev=net2,mac=00:0c:29:86:44:b4,bus=pciBridge5.0,addr=0x3.0x0 
> \
>  -netdev \
>  type=tap,id=net2,ifname=vif.2-emu,script=/etc/qemu-ifup,downscript=no \
>  -device \
>  
> vmxnet3,id=nic3,netdev=net3,mac=00:0c:29:86:44:be,bus=pciBridge5.0,addr=0x4.0x0
>  \
>  -netdev \
>  type=tap,id=net3,ifname=vif.3-emu,script=/etc/qemu-ifup,downscript=no \
>  -m 1016
> 
> 
> Which under Linux (with this patch)looks like:
> 
> 
> address@hidden ~]# lspci
> 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
> 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
> 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton
> II]
> 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
> 00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev
> 01)
> 00:03.0 VGA compatible controller: Cirrus Logic GD 5446
> 00:11.0 PCI bridge: Red Hat, Inc. Device 0001
> 00:15.0 PCI bridge: Red Hat, Inc. Device 0001
> 00:16.0 PCI bridge: Red Hat, Inc. Device 0001
> 00:18.0 PCI bridge: Red Hat, Inc. Device 0001
> 01:03.0 SCSI storage controller: VMware PVSCSI SCSI Controller
> 01:17.0 PCI bridge: Red Hat, Inc. Device 0001
> 02:01.0 SCSI storage controller: VMware PVSCSI SCSI Controller
> 03:01.0 SCSI storage controller: VMware PVSCSI SCSI Controller
> 03:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet
> Controller (rev 03)
> 03:04.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
> address@hidden ~]# lspci -t
> -[0000:00]-+-00.0
>            +-01.0
>            +-01.1
>            +-01.3
>            +-02.0
>            +-03.0
>            +-11.0-[01-02]--+-03.0
>            |               \-17.0-[02]----01.0
>            +-15.0-[03]--+-01.0
>            |            +-03.0
>            |            \-04.0
>            +-16.0-[04]--
>            \-18.0-[05]--
> address@hidden ~]#
> 
> 
> The issue that I am attempting to fix is that xen is told (without this
> patch):
> 
> 
> xen_map_pcidev id: 1 bdf: 00.00.00
> xen_map_pcidev id: 1 bdf: 00.01.00
> xen_map_pcidev id: 1 bdf: 00.01.01
> xen_map_pcidev id: 1 bdf: 00.01.03
> xen_map_pcidev id: 1 bdf: 00.02.00
> xen_map_pcidev id: 1 bdf: 00.03.00
> xen_map_pcidev id: 1 bdf: 00.04.00
> xen_map_pcidev id: 1 bdf: 00.05.00
> xen_map_pcidev id: 1 bdf: 00.11.00
> xen_map_pcidev id: 1 bdf: 00.15.00
> xen_map_pcidev id: 1 bdf: 00.16.00
> xen_map_pcidev id: 1 bdf: 00.17.00
> xen_map_pcidev id: 1 bdf: 00.18.00
> xen_map_pcidev id: 1 bdf: 00.01.00
> xen_map_pcidev id: 1 bdf: 00.01.00
> xen_map_pcidev id: 1 bdf: 00.03.00
> xen_map_pcidev id: 1 bdf: 00.03.00
> xen_map_pcidev id: 1 bdf: 00.04.00
> 
> 
> Now 00.17.00, 00.01.00, 00.01.00,  00.03.00, 00.03.00, and 00.04.00 are just
> wrong.
> 
> After the patch you get (when PCI_SECONDARY_BUS of 00.11.0 is set to 1):
> 
> 
> xen_map_pcidev id: 1 bdf: 01.03.00
> xen_map_pcidev id: 1 bdf: 01.17.00
> 
> 
> and then (when PCI_SECONDARY_BUS of 01.17.0 is set to 2):
> 
> 
> xen_map_pcidev id: 1 bdf: 02.01.00
> 
> 
> and then (when PCI_SECONDARY_BUS of 00.15.0 is set to 3):
> 
> 
> xen_map_pcidev id: 1 bdf: 03.01.00
> xen_map_pcidev id: 1 bdf: 03.03.00
> xen_map_pcidev id: 1 bdf: 03.04.00
> 
> 
> Which fixes the bug I ran into.  Did you want me to speed the time to open a
> QEMU bug?
> 
> 
> 
> >>>
> >>>>diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> >>>>index 40c97b1..042680d 100644
> >>>>--- a/hw/pci/pci_bridge.c
> >>>>+++ b/hw/pci/pci_bridge.c
> >>>>@@ -241,6 +241,18 @@ void pci_bridge_update_mappings(PCIBridge *br)
> >>>>      pci_bridge_region_cleanup(br, w);
> >>>>  }
> >>>>+static void pci_bridge_realize_sub(PCIBus *b, PCIDevice *d,
> >>>>+                                   void *opaque)
> >>>>+{
> >>>>+    device_listener_realize(DEVICE(d));
> >>>>+}
> >>>>+
> >>>>+static void pci_bridge_unrealize_sub(PCIBus *b, PCIDevice *d,
> >>>>+                                     void *opaque)
> >>>>+{
> >>>>+    device_listener_unrealize(DEVICE(d));
> >>>>+}
> >>>>+
> >>>>  /* default write_config function for PCI-to-PCI bridge */
> >>>>  void pci_bridge_write_config(PCIDevice *d,
> >>>>                               uint32_t address, uint32_t val, int len)
> >>>>@@ -248,6 +260,8 @@ void pci_bridge_write_config(PCIDevice *d,
> >>>>      PCIBridge *s = PCI_BRIDGE(d);
> >>>>      uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
> >>>>      uint16_t newctl;
> >>>>+    uint8_t oldbus = pci_get_byte(d->config + PCI_SECONDARY_BUS);
> >>>>+    uint8_t newbus;
> >>>>      pci_default_write_config(d, address, val, len);
> >>>>@@ -265,6 +279,22 @@ void pci_bridge_write_config(PCIDevice *d,
> >>>>          pci_bridge_update_mappings(s);
> >>>>      }
> >>>>+    newbus = pci_get_byte(d->config + PCI_SECONDARY_BUS);
> >>>>+    if (newbus != oldbus) {
> >>>>+        PCIBus *sec_bus = pci_bridge_get_sec_bus(s);
> >>>>+
> >>>>+        if (oldbus && oldbus != 255) {
> >>>>+            pci_set_byte(d->config + PCI_SECONDARY_BUS, oldbus);
> >>>>+            pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> >>>>+                                pci_bridge_unrealize_sub, NULL);
> >>>>+            pci_set_byte(d->config + PCI_SECONDARY_BUS, newbus);
> >>>>+        }
> >>>>+        if (newbus && newbus != 255) {
> >>>>+            pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> >>>>+                                pci_bridge_realize_sub, NULL);
> >>>>+        }
> >>>>+    }
> >>>>+
> >>>
> >>>This is relying on undocumented assumptions and how specific firmware
> >>>works. There's nothing special about bus number 255, and 0 is not very
> >>>special either (except it happens to be the reset value).
> >>>
> >>Ok, using the above it would change to (almost):
> >>
> >>
> >>if (newbus != oldbus) {
> >>     pci_for_each_device(pci_bridge_get_sec_bus(s),
> >>                         pci_bus_num(sec_bus),
> >>                         device_listener_change_pci_bus_num,
> >>                         oldbus);
> >>}
> >Not really because it's not just secondary bus number.
> >Changing subordinate bus numbers can hide/unhide whole buses.
> >
> 
> You are right.  I have no idea what Paul Durrant was thinking about this
> case.
> And this would apply to PCI_SUBORDINATE_BUS not PCI_SECONDARY_BUS.
> 
> Since at QEMU 2.2 Xen sends all pci-config requests to QEMU, things
> "worked".

Why "worked"?


> It is not clear to me that the complexity of tracking bus
> visibility make sense.  Clearly you do.

Well what was the point of the change?
If you don't care that we get irrelevant config cycles why not
just send them all to QEMU?



> 
> >
> >>Would it be better to have:
> >>
> >>void device_listener_change_pci_bus_num(PCIBus *b, PCIDevice *d, void
> >>*opaque)
> >>{
> >>     uint8_t oldbus = (uint8_t)opaque;
> >>     DEVICE_LISTENER_CALL(change_pci_bus_num, Forward, d, oldbus);
> >>}
> >>
> >>So that the above works, or to add a function to convert args?
> >>
> >>>To know whether device is accessible to config cycles, you
> >>>really need to scan the hierarchy from root downwards.
> >>>
> >>Yes, that is correct.  However what I am doing here is not
> >>changing how QEMU checks if the device is accessible, but
> >>changing what pci config Xen sends to QEMU.  If the change
> >>to PCI_SECONDARY_BUS hides this PCI to PCI bridge, that is
> >>not an issue.
> >>
> >>
> >>    -Don Slutz
> >>
> >>
> >Imagine a bridge with secondary bus number 5
> >behind another one with subordinate numbers 1-3.
> >You should not send conf cycles for bus number 5 to qemu.
> >
> 
> That is correct.  How ever unless Paul Durrant has an example of more then 1
> "QEMU" where this would make a difference, the cases I am aware of are:
> 
> 1) Xen does not send it, and returns 0xffffffff (or smaller).
> 
> 2) QEMU returns 0xffffffff (or smaller).
> 
> I will grant that #1 is faster, but it also is only happening during start
> up and so I do not see the clear win to add more complex code to only do #1.
> 
>    -Don Slutz

It's not about faster. I assumed you need to get just the correct
stuff out to QEMU to gain the better security as
3996e85c1822e05c50250f8d2d1e57b6bea1229d claims.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]