qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] how Windows treats BARs of driver-less devices when oth


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged
Date: Thu, 25 Feb 2016 16:18:35 +0200

On Thu, Feb 25, 2016 at 03:05:08PM +0100, Laszlo Ersek wrote:
> On 02/25/16 14:30, Michael S. Tsirkin wrote:
> > On Thu, Feb 25, 2016 at 02:00:09PM +0100, Laszlo Ersek wrote:
> >> On 02/25/16 13:44, Laszlo Ersek wrote:
> >>> Hi,
> >>>
> >>> On 02/25/16 12:57, Michael S. Tsirkin wrote:
> >>>> ----- Forwarded message from Igor Mammedov <address@hidden> -----
> >>>>
> >>>> Date: Thu, 11 Feb 2016 16:16:05 +0100
> >>>> From: Igor Mammedov <address@hidden>
> >>>> To: "Michael S. Tsirkin" <address@hidden>
> >>>> To: address@hidden
> >>>> Subject: on pci rebalancing
> >>>> Message-ID: <address@hidden>
> >>>> In-Reply-To: <address@hidden>
> >>>>
> >>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI 
> >>>>>>>> driver
> >>>>>>>> otherwise OS will ignore it when rebalancing happens and
> >>>>>>>> might map something else over ignored BAR.    
> >>>>>>>
> >>>>>>> Does it disable the BAR then? Or just move it elsewhere?  
> >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
> >>>>>> another device with driver over it.  
> >>>>>
> >>>>> Interesting. On classical PCI this is a forbidden configuration.
> >>>>> Maybe we do something that confuses windows?
> >>>>> Could you tell me how to reproduce this behaviour?
> >>>> #cat > t << EOF
> >>>> pci_update_mappings_del
> >>>> pci_update_mappings_add
> >>>> EOF
> >>>>
> >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> >>>>  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> >>>>  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> >>>>  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> >>>>
> >>>> wait till OS boots, note BARs programmed for ivshmem
> >>>>  in my case it was
> >>>>    01:01.0 0,0xfe800000+0x100
> >>>> then execute script and watch pci_update_mappings* trace events
> >>>>
> >>>> # for i in $(seq 3 18); do printf -- "device_add 
> >>>> e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> >>>>
> >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> >>>> and then programs new BARs, where:
> >>>>   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> >>>> creates overlapping BAR with ivshmem 
> >>>
> >>> Michael informed me of this on IRC (and forwarded this email to me). I 
> >>> hope to start a new thread with my response. (I also reedited the subject 
> >>> fully.)
> >>>
> >>> So, to summarize what I said on IRC first. The situation where firmware 
> >>> recognizes and enables a PCI device, hands control to the OS, and then 
> >>> the OS lacks a driver for the PCI device, is completely normal and 
> >>> expected. For UEFI specifically, I can name a general argument and a 
> >>> specific argument.
> >>>
> >>> The general argument is that actions that need to be taken in 
> >>> ExitBootServices() callbacks do not include clearing IO or MMIO decode 
> >>> bits in PCI device command registers. Command register manipulation 
> >>> happens when a PCI device driver (that conforms to the UEFI driver model) 
> >>> *binds* or *unbinds* a device. And unbinding a device is not possible in 
> >>> the ExitBootServices() callback, minimally because such callbacks are 
> >>> forbidden from modifying the memory map -- but unbinding would release 
> >>> allocated memory.
> >>>
> >>> So what we use such callbacks for is aborting in-flight, outstanding 
> >>> DMA-like transfers. Re-setting virtio devices is also an example (think 
> >>> outstanding receive requests for virtio-net).
> >>>
> >>> Now let's move on to the specific argument I mentioned above. The 
> >>> Graphics Output Protocol (GOP) is a UEFI abstraction that was 
> >>> specifically designed with the case in mind when the operating system 
> >>> doesn't have a display driver -- yet installed --, but the user obviously 
> >>> has to use the display somehow. The GOP is most frequently provided on 
> >>> top of an EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP 
> >>> driver" is a UEFI driver that drives a PCI device. In short, the driver 
> >>> provides the GOP on top of a PCI device.
> >>>
> >>> Now, the GOP is supposed to communicate the pixel format and the frame 
> >>> buffer base address for the currently active graphics mode to the 
> >>> software that consumes the GOP. This includes UEFI applications of course 
> >>> (think a boot loader putting up a splash screen or an anmiation), but 
> >>> importantly, the runtime OS is *also* supposed to inherit these 
> >>> characteristics from boot services time. The OS can then use simple 
> >>> unaccelerated MMIO writes to display things on the screen, until the 
> >>> users installs an accelerated driver.
> >>>
> >>> (Concrete example: this is why you can see *anything at all* on the 
> >>> screen, when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL 
> >>> display, before installing the QXL WDDM driver in the guest.)
> >>>
> >>> Clearly, the frame buffer base address communicated through the GOP 
> >>> points into one of the MMIO BARs of the PCI device. If, at 
> >>> ExitBootServices(), MMIO decoding were disabled for the PCI device that 
> >>> underlies the GOP, that would *completely* defeat the GOP design. The 
> >>> OS's attempt to poke at those MMIO addresses would be futile -- and in 
> >>> fact the OS has no idea what PCI device (if any) the framebuffer is 
> >>> supposed to be related to. This is the jurisdiction of the OS-level 
> >>> display driver -- if one exists and is installed.
> >>>
> >>> So, this is a Windows bug in my option. Just because there is no OS-level 
> >>> driver, a PCI device is fully expected to be decoding resources, if the 
> >>> firmware brought it up.
> >>>
> >>> --*--
> >>>
> >>> Okay, so Michael asked me to try to reproduce the above with OVMF, and 
> >>> see what happens. Unfortunately I'm not really knowledgeable about 
> >>> ivshmem, hotplug, et cetera. Let me instead tell Igor about using OVMF.
> >>>
> >>> (1) Please follow the instructions on Gerd's page 
> >>> <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" 
> >>> package.
> >>>
> >>> (2) Create a separate directory for testing. In this directory, run the 
> >>> following command:
> >>>
> >>>   cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd
> >>>
> >>> Also create a disk image for your new guest, etc.
> >>>
> >>> (3) Use the following command line snippet to work with OVMF:
> >>>
> >>>      qemu-system-x86_64 \
> >>>        -machine accel=kvm \
> >>>        -smp cpus=2 \
> >>>        -m 2048 \
> >>>        \
> >>>        -debugcon file:ovmf.debug.log \
> >>>        -global isa-debugcon.iobase=0x402 \
> >>>        \
> >>>        -device qxl-vga \
> >>>        \
> >>>        -drive 
> >>> if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd
> >>>  \
> >>>        -drive if=pflash,format=raw,unit=1,file=myvars.fd \
> >>>        \
> >>>        [your options here]
> >>>
> >>> You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, 
> >>> network, and so on.
> >>>
> >>> Recommended: when you use the -device option to add the disk and the 
> >>> CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the 
> >>> "bootindex" property. OVMF will adhere to the boot order. It is 
> >>> recommended to set bootindex=0 for your main disk, bootindex=1 for your 
> >>> OS installer CD-ROM, and *no* bootindex for your virtio-win driver disk. 
> >>> This way at first boot (with no OS installed) OVMF will boot the 
> >>> installer CD-ROM. Further boots (with the same command line) will boot 
> >>> the installed OS.
> >>>
> >>> Caveat: I never used the -snapshot option with OVMF virtual machines; it 
> >>> might or might not work.
> >>>
> >>> Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows 
> >>> running on OVMF many months ago, but I can't tell off-hand if it will 
> >>> work right now.
> >>
> >> I should also mention that you might not be able to reproduce the same
> >> situation with the "ivshmem" device. Namely, if there is no UEFI driver
> >> for that PCI device (and OVMF certainly doesn't have one), then its MMIO
> >> and IO decoding bits will *never* be set. As I said, command register
> >> massaging is the jurisdiction of the individual UEFI driver that
> >> ultimately binds the device -- and OVMF has no UEFI driver for ivshmem.
> >>
> >> Therefore you should probably try to reproduce the issue with another
> >> PCI device type that OVMF has a driver for, but Windows has none
> >> (installed at least). I'm quite hard pressed to name such a device type,
> >> unfortunately. :(
> > 
> > virtio?
> 
> ... was my first thought as well, but OVMF at the moment supports only
> legacy (0.9.5) virtio-pci devices

Oh. We'll have to fix that too :(

> (and virtio-mmio only on AARCH64) --
> those don't have MMIO BARs, only IO BARs.

Well that's not exactly true - there is an MSI-X BAR.
Maybe OVMF does not enable that, though.

> Theoretically the Windows overlap issue should be triggerable with IO
> BARs just the same (resource - resource, right?), but I doubt it will be
> reproducible in practice.
> 
> Laszlo
> 
> >> Perhaps one of the more obscure emulated NICs could work in place of
> >> ivshmem. (The IPXE oproms provide UEFI drivers for those.)
> >>
> >> Thanks
> >> Laszlo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]