[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] AMD video card passthrough reset issues
From: |
Lucio Andrés Illanes Albornoz |
Subject: |
Re: [Qemu-devel] AMD video card passthrough reset issues |
Date: |
Tue, 2 Dec 2014 17:14:56 +0100 |
On Tue, 02 Dec 2014 08:26:20 -0700 Alex Williamson <address@hidden> wrote:
> All of the Bonaire-based AMD GPUs seems to have issues with reset
> (R7790, R7 260/X). I've tried to engage AMD on this, but haven't gotten
> any response on this topic yet. For devices like this that don't
> support any kind of function level reset (FLR), VFIO will try to do a
> PCI bus reset on guest reboot. This is as close as we can get to how
> the BIOS resets the device on a host reboot. Unfortunately on these
> cards there seems to be some sort of disconnect between the PCI bus
> interface reset and resetting the rest of the GPU. I believe I've even
> seen cases where a PCI bus reset appears to have no affect on the GPU
> when running in VGA mode. My best guess is that some firmware running
> in the card isn't clearing itself on reset an attempting to reload it
> causes errors. Note that a guest can be reset multiple times and the
> device continues to work if the guest is restricted to standard VGA
> drivers (in VGA passthrough mode of course).
My experience is consistent with that description; the bus reset initiated
through the hotplug reset interface appears to leave whichever part(s) of my
video card in a state the AMD driver is not prepared to handle upon 2nd bootup
(e.g. first VM reboot) and thereafter, it's completely gone: endless amounts of
IOTLB_INV_TIMEOUT and `Completion-Wait loop timed out' kernel messages and
particularly, no VGA output at all when doing primary passthrough (which I no
longer require since vgacon isn't too fond of that,) and possibly even hangs
upon running lspci (8) afterwards (if I remember correctly, that is.)
I had originally intended to have QEMU trace MMIO in general and PCI{,-E}
bus/device traffic (as relevant) in order to establish what arcane incantations
Windows could possibly be performing, but that only ended up showing me PCI
configuration space read I/O and IRQ reassignments upon disabling my video
card; WinDbg/Kd* is far too slow to facilitate tracing PCI{,-E} traffic through
breakpoints and were I to possess the Windows Research Kernel source code,
speaking completely hypothetically here, I would then unfortunately have to
find out that QEMU w/ KVM plus AMD's drivers doesn't go along too well w/
Windows Server 2003. I then figured that having drivers/vfio/pci/* produce that
information should ultimately lead me towards the solution but I can't quite
see to that just yet; the remove/rescan dance is the only thing that,
pragmatically speaking, actually works for me at present.
> In your experiment with removing and rescanning the device, are you
> simply doing 'echo 1 > remove; echo 1 > /sys/bus/pci/rescan'? Thanks,
Yes.