qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] VFIO VGA test branches


From: Knut Omang
Subject: Re: [Qemu-devel] VFIO VGA test branches
Date: Mon, 20 May 2013 23:11:59 +0200

On Sun, 2013-05-19 at 22:15 -0600, Alex Williamson wrote:
> On Sun, 2013-05-19 at 17:35 +0200, Knut Omang wrote:
> > On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:
> > > On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > > > Hi all,
> > > > 
> > > > Perfect timing from my perspective, thanks Alex!
> > > > 
> > > > I spent the better part of the weekend testing your branches on a new 
> > > > system 
> > > > I just put together for this purpose, results below..
> > > > 
> > > > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > > > ...
> > > > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> > > > 
> > > > System setup: 
> > > > 
> > > > - Fedora 18 on
> > > > - Gigabyte Z77X-UD5H motherboard
> > > > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > > > - 2 discrete graphics cards:
> > > > 
> > > > lspci | egrep 'VGA|Audio'
> > > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 
> > > > v2/3rd Gen Core processor Graphics Controller (rev 09)
> > > > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset 
> > > > Family High Definition Audio Controller (rev 04)
> > > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI 
> > > > Caicos [Radeon HD 6450]
> > > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI 
> > > > Audio [Radeon HD 6400 Series]
> > > > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI 
> > > > Cape Verde PRO [Radeon HD 7700 Series]
> > > > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape 
> > > > Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > > > 
> > > > Short summary:
> > > > 
> > > > - Once I got past a few time consuming obstacles explained below
> > > >    - the graphics part of the graphics/hdmi audio passthrough seems to 
> > > > work perfect
> > > >      on both discrete graphics cards 
> > > >      (though so far only one at at time and with some minor issues, see 
> > > > below)
> > > >    - no success with the hdmi audio yet (ideas for further 
> > > > investigation appreciated!)
> > > 
> > > I've had hdmi audio working with an HD7850, but only in Windows (7) and
> > > it was using legacy interrupts for some reason instead of MSI.  I wonder
> > > if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> > > what's wrong with MSI, but it seems to be new with the PCI bus reset
> > > support.
> > 
> > In my first tries, Windows were just using a generic
> > VGA driver, which still seems to work perfect with reboots and everything 
> > and in full screen resolution (1920x1200).
> > However after installing the Catalyst AMD driver stack, upon boot
> > Windows 7 now consequently get a BSOD from the graphics driver
> > with the message:
> > 
> > "Attempt to reset the display driver and recover from timeout failed"
> > - a picture of the BSOD screen attached.
> 
> I've seen that BSOD before, but I don't know how to reproduce it.  It
> seems like I haven't seen it with the PCI bus reset code.  I'm running
> version 13.1 of the catalyst driver, you?

I first tried with the install CD that came with the card - v.13-045
then upgraded to the latest from AMD, catalyst v.13.4 which appears to
be driver v.12.104 - similar behaviour for both. This was with a plain
Windows 7 install from my SP1 DVD. 

With most recommended windows updates and the latest catalyst driver,
the BSOD is gone but instead I see the initial VGA boot screen and the
windows logo, then syncs but no display and then reboot into recovery
mode. (If I try all updates, Windows seems never to be able to recover
from the last reboot)

I have tried without kvm and also with vnc or spice graphics in addition
but in those cases it seems Windows is not able to allocate MMIO
resources for both adapters so I haven't been able to test the catalyst
driver as a secondary windows display.

> > I attach the corresponding vfio log where I added some timing code to
> > make it easier to see when the BSOD happens (with 2 seconds of silence
> > in the log before the VM reboots, I believe this is at 09:28:32-34 in
> > the log.
> 
> Yep, looks like that's where windows starts the BSOD.
> 
> > Similar behaviour both just after reboot/power cycle of the host and
> > subsequent VM boot attempts.
> > 
> > This is still with the HD7700 as passed through device, but after a
> > motherboard firmware upgrade (to F14) which did not seem to affect the
> > observed behaviour on Windows prior to Catalyst install or with Linux
> > guest, neither did it fix the bug in selecting primary devices as I 
> > was hoping for.
> > 
> > Let me know if you have ideas for further debugging this,
> 
> I don't have any great ideas since I don't know how to reproduce the
> timeout.  Double/triple check that you're using the correct
> vfio-vga-reset branches in both qemu and kernel
> 
> # grep VFIO_DEVICE_PCI_BUS_RESET qemu.git/hw/misc/vfio.c
> # grep VFIO_DEVICE_PCI_BUS_RESET linux.git/drivers/vfio/pci/vfio_pci.c

[Matches in both..]
I do believe I have used the right branches all along.

> I didn't see anything telling in your DMAR either.  The system seems to
> have just one DRHD that includes everything, so I'm not sure why you saw
> any behavior change from igfx_off.  Thanks,

After the firmware upgrade, I tried again with the integrated graphics
enabled, this time with more success - I am now able to get a GUI fedora
console on the integrated graphics, but see some colorful artifacts
there during the VGA startup on one of the Radeon cards, which goes away
with a toggle to another console and back.

Seems I have slightly mislead you with the DMAR table - sorry about that
- the table I posted was with the igfx disabled, with the igfx enabled I
see one more hardware unit dedicated to the igfx if I am able to
interpret it right (attached)

Both the HD7700 and the HD6450 behave very similar and both still starts
and displays Windows fine if I disable the Catalyst driver.

Knut

> Alex
> 
> > > > - Contrary to address@hidden I had no success with using pci-assign for 
> > > > VGA
> > > >   with a standard fedora 18 kernel and fairly recent qemu, nor with 
> > > > your branches, 
> > > > 
> > > > Details:
> > > > 
> > > > - I started off with the required kernel parameter 'intel_iommu=on' + 
> > > > necessary parameters for disabling radeon
> > > >    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated 
> > > > graphics as primary display
> > > >    - this caused the system to freeze (with color artifacts on the 
> > > > console)
> > > > 
> > > > - In my naivity and because of the "i" in ifgx I tried both with 
> > > >   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off' 
> > > >   and a full set of combinations of vfio, cards, kernels and pci-assign 
> > > > before I suspected 
> > > >   that iommu support was turned off for **all** graphics cards with 
> > > > igfx_off
> > > 
> > > I'm not sure why this is, looks like the code only tries to turn it off
> > > when only graphics is under the remapping device.  We'd probably need to
> > > see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).
> > > 
> > > > - The solution was to have integrated graphics turned off in the BIOS, 
> > > > and 'intel_iommu=on':
> > > > 
> > > > - iommu groups:
> > > > 
> > > > ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> > > > total 0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> 
> > > > ../../../../devices/pci0000:00/0000:00:01.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> 
> > > > ../../../../devices/pci0000:00/0000:00:01.1
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> 
> > > > ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> 
> > > > ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> 
> > > > ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> 
> > > > ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
> > > > 
> > > > - eg. both the VGA/HDMI Audio pairs + the two root ports they are 
> > > > plugged into are in the same group:
> > > 
> > > Ick.  Intel has been pretty good about advertising ACS support on their
> > > root ports.  I wonder if this is an oversight or if they are actually
> > > not isolated from each other.
> > > 
> > > > # lspci -n
> > > > ...
> > > > 01:00.0 0300: 1002:683f
> > > > 01:00.1 0403: 1002:aab0
> > > > 02:00.0 0300: 1002:6779
> > > > 02:00.1 0403: 1002:aa98
> > > > ...
> > > > 
> > > > modprobe vfio_pci
> > > > echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> > > > echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> > > > echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > 
> > > > # lsusb 
> > > > ...
> > > > Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch 
> > > > Keyboard
> > > > Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 
> > > > [B110 Optical USB Mouse]
> > > > ...
> > > > 
> > > > - I also applied your suggested patch to the quirk function in VFIO 
> > > > (see below)
> > > > 
> > > > - Here is a (trimmed for readability) command line I successfully used 
> > > > to boot from the Windows 7 install DVD, 
> > > >   notice the cd and disk device descriptions and the bus parameter - I 
> > > > struggled a while with that 
> > > >   until I came across a comment by Gerd Hoffmann here: 
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
> > > > 
> > > > 
> > > > qemu-kvm -M q35 \
> > > >   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
> > > >   -device 
> > > > vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0
> > > >  \
> > > >   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
> > > >   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
> > > >   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 
> > > > 2 \
> > > >   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
> > > >   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
> > > >   -device ide-drive,drive=ivm03,bus=ide.0 \
> > > >   -device ide-cd,drive=cd,bus=ide.1 \
> > > >   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
> > > >   -enable-kvm \
> > > >   -device usb-host,hostbus=1,hostaddr=8 \
> > > >   -device usb-host,hostbus=1,hostaddr=4
> > > > 
> > > > - Both the graphics card seemshould really support ACS on s to have a 
> > > > rom but only the HD6450 let itself to "scraping". 
> > > 
> > > Did you try scraping the HD6450 while the HD7700 was the boot VGA and
> > > vica versa?  The boot VGA ROM is handled in a special way and what you
> > > really get is the shadow copy, which isn't what we want.
> > > 
> > > > Anyway, supplying it to vfio did not seem to make any difference.
> > > > 
> > > > find /sys -name rom
> > > > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> > > > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> > > > ...
> > > > 
> > > > Some observations and remaining unresolved issues:
> > > > 
> > > > - VFIO patch:
> > > >   Initially (while still running with igfx_off) I observed exactly the 
> > > > same behaviour as address@hidden
> > > >   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up 
> > > > spinning with repeated calls to
> > > >   vfio_ati_3c3_quirk_read and repeated logs: 
> > > >     vfio: vfio_vga_read(0x3c3, 1) = 0x0
> > > >   I patched up accordingly with 
> > > > 
> > > > 
> > > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > > index da0e5f9..a361d06 100644
> > > > --- a/hw/misc/vfio.c
> > > > +++ b/hw/misc/vfio.c
> > > > @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void 
> > > > *opaque,
> > > >      uint64_t data = 
> > > > vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> > > >                                    addr + quirk->data.base_offset, 
> > > > size);
> > > >  
> > > > -    if (data == quirk->data.address_match) {
> > > > +    if (1 || data == quirk->data.address_match) {
> > > >          data = vfio_pci_read_config(&vdev->pdev, 
> > > > quirk->data.address_val, size);
> > > >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> > > >      }
> > > > 
> > > > 
> > > >   This of course did not help much until I actually got the iommu 
> > > >   enabled for the radeons (similar "repeated patters" as deniv reported)
> > > >   but what I have observed after I got it working is that if 
> > > >   I disable the patch above, things are not that well: the Fedora VM 
> > > >   comes up with VGA and the Fedora boot screen, then goes blank when 
> > > >   switching to X.
> > > 
> > > Hmm, I think we'd probably have better luck making that unconditional
> > > until we have reason to do otherwise.
> > > 
> > > > - The fact that the iommu group now extends across all my available 
> > > > graphics 
> > > >   devices now makes it difficult to  get the radeon (or catalyst) 
> > > > driver use to 
> > > >   the other card since the vfio_pci driver needs to hold it.
> > > >   Not a complete showstopper since the vesa driver comes up with 
> > > > 1024x768..
> > > >   Might it be a good idea to have an override option (exception list or 
> > > > similar?) 
> > > >   to allow the vfio_pci to be less restrictive about owning the whole 
> > > > group 
> > > >    - allow functionality over security in such case? This of course is 
> > > > further complicated
> > > >   by the need for graphics drivers to be disabled/enabled already at 
> > > > the kernel prompt..
> > > 
> > > We have a quirk in the kernel that enables us to witelist devices, but
> > > yes, there is no flexibility in this w/o modifying the code and
> > > rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
> > > the example above w/ pci_dev_dma_source - function can just return 1)
> > > 
> > > > - There seems to be a bug in the (version F8) UEFI BIOS on the 
> > > > motherboard, 
> > > >   The BIOS offers (undocumented) a full range of selections of which 
> > > > PCIe 
> > > >   (or PCIe 1x) graphics card to use as primary, but any other selection 
> > > >   than the first PCIe 16x slot has no effect and the motherboard 
> > > > reverts 
> > > >   to the first slot, so to be able to test both cards, I had to put the 
> > > > card under test
> > > >   into the second (8x) PCIe slot. I am waiting for feedback from 
> > > > Gigabyte on possible 
> > > >   fixes for this in newer BIOSes.
> > > > 
> > > > - The ultimate goal is to try to consolidate some older Windows 
> > > > desktops as "seats" 
> > > >   on the new system, using the discrete graphics with HDMI/Displayport 
> > > > audio. 
> > > >   With the HD7700 moved to the second PCIe slot I tested both Windows 
> > > > and 
> > > >   Linux guests to try to get some sound through the HDMI audio device. 
> > > >   Windows complains that no usable device is available. On Linux 
> > > > (Fedora 18, KDE desktop), 
> > > >   the system settings -> multimedia dialogue never opens up which seems 
> > > > to indicate that 
> > > >   PulseAudio has problems communicating with the passed through device 
> > > > (?), 
> > > >   any hints/pointers here appreciated. From the vfio log it seems at 
> > > > least
> > > >   config space is accessed ok.
> > > > 
> > > > - There also seems to be issues with radeon and intel_iommu=on - if I 
> > > > try 
> > > >   to enable modesetting and normal X support for the radeon cards, X 
> > > > fails to start.
> > > > 
> > > > - It would be nice if the integrated graphics could be used as the host 
> > > > primary display - 
> > > >   I would be happy if someone has any hints as to if/how the ifgx_off 
> > > > option 
> > > >   could be extended/modified to only affect iommu operation on selected 
> > > > device(s),
> > > >   if at all possible..
> > > 
> > > Let's see what we can discover from your DMAR.  Also send along sudo
> > > lspci -vvv.  Thanks,
> > > 
> > > Alex
> > > 
> > 
> > 
> 
> 
> 

Attachment: DMAR_igfx.dsl
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]