Re: [Qemu-devel] Multi GPU passthrough via VFIO

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Multi GPU passthrough via VFIO

From:	Alex Williamson
Subject:	Re: [Qemu-devel] Multi GPU passthrough via VFIO
Date:	Thu, 13 Feb 2014 17:33:51 -0700
On Fri, 2014-02-14 at 01:01 +0100, Maik Broemme wrote:
> Hi Alex,
> 
> Maik Broemme <address@hidden> wrote:
> > Hi Alex,
> > 
> > Alex Williamson <address@hidden> wrote:
> > > On Fri, 2014-02-07 at 01:22 +0100, Maik Broemme wrote:
> > > > Interesting is the diff between 1st and 2nd boot, so if I do the lspci
> > > > prior to the booting. The only difference between 1st start and 2nd
> > > > start are:
> > > > 
> > > > --- 001-lspci.290x.before.1st.log       2014-02-07 01:13:41.498827928 
> > > > +0100
> > > > +++ 004-lspci.290x.before.2nd.log       2014-02-07 01:16:50.966611282 
> > > > +0100
> > > > @@ -24,7 +24,7 @@
> > > >                         ClockPM- Surprise- LLActRep- BwNot-
> > > >                 LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- 
> > > > CommClk+
> > > >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > -               LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ 
> > > > DLActive- BWMgmt- ABWMgmt-
> > > > +               LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
> > > > DLActive- BWMgmt- ABWMgmt-
> > > >                 DevCap2: Completion Timeout: Not Supported, 
> > > > TimeoutDis-, LTR-, OBFF Not Supported
> > > >                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, 
> > > > LTR-, OBFF Disabled
> > > >                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- 
> > > > SpeedDis-
> > > > @@ -33,13 +33,13 @@
> > > >                 LnkSta2: Current De-emphasis Level: -3.5dB, 
> > > > EqualizationComplete-, EqualizationPhase1-
> > > >                          EqualizationPhase2-, EqualizationPhase3-, 
> > > > LinkEqualizationRequest-
> > > >         Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> > > > -               Address: 0000000000000000  Data: 0000
> > > > +               Address: 00000000fee00000  Data: 0000
> > > >         Capabilities: [100 v1] Vendor Specific Information: ID=0001 
> > > > Rev=1 Len=010 <?>
> > > >         Capabilities: [150 v2] Advanced Error Reporting
> > > >                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
> > > > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > > >                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
> > > > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > > >                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- 
> > > > UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > > > -               CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
> > > > NonFatalErr-
> > > > +               CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
> > > > NonFatalErr+
> > > >                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
> > > > NonFatalErr+
> > > >                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- 
> > > > ChkCap+ ChkEn-
> > > >         Capabilities: [270 v1] #19
> > > > 
> > > > After that if I do suspend-to-ram / resume trick I have again lspci
> > > > output from before 1st boot.
> > > 
> > > The Link Status change after X is stopped seems the most interesting to
> > > me.  The MSI change is probably explained by the MSI save/restore of the
> > > device, but should be harmless since MSI is disabled.  I'm a bit
> > > surprised the Correctable Error Status in the AER capability didn't get
> > > cleared.  I would have thought that a bus reset would have caused the
> > > link to retrain back to the original speed/width as well.  Let's check
> > > that we're actually getting a bus reset, try this in addition to the
> > > previous qemu patch.  This just enables debug logging for the bus resest
> > > function.  Thanks,
> > > 
> > 
> > Below are the outputs from 2 boots, VGA, load fglrx and start X. (2nd
> > time X gets killed and oops happened)
> > 
> > - 1st boot:
> > 
> > vfio: vfio_pci_hot_reset(0000:01:00.1) multi
> > vfio: 0000:01:00.1: hot reset dependent devices:
> > vfio:       0000:01:00.0 group 1
> > vfio:       0000:01:00.1 group 1
> > vfio: 0000:01:00.1 hot reset: Success
> > vfio: vfio_pci_hot_reset(0000:01:00.1) one
> > vfio: 0000:01:00.1: hot reset dependent devices:
> > vfio:       0000:01:00.0 group 1
> > vfio: vfio: found another in-use device 0000:01:00.0
> > vfio: vfio_pci_hot_reset(0000:01:00.0) one
> > vfio: 0000:01:00.0: hot reset dependent devices:
> > vfio:       0000:01:00.0 group 1
> > vfio:       0000:01:00.1 group 1
> > vfio: vfio: found another in-use device 0000:01:00.1
> > 
> > - 2nd boot:
> > 
> > vfio: vfio_pci_hot_reset(0000:01:00.1) multi
> > vfio: 0000:01:00.1: hot reset dependent devices:
> > vfio:       0000:01:00.0 group 1
> > vfio:       0000:01:00.1 group 1
> > vfio: 0000:01:00.1 hot reset: Success
> > vfio: vfio_pci_hot_reset(0000:01:00.1) one
> > vfio: 0000:01:00.1: hot reset dependent devices:
> > vfio:       0000:01:00.0 group 1
> > vfio: vfio: found another in-use device 0000:01:00.0
> > vfio: vfio_pci_hot_reset(0000:01:00.0) one
> > vfio: 0000:01:00.0: hot reset dependent devices:
> > vfio:       0000:01:00.0 group 1
> > vfio:       0000:01:00.1 group 1
> > vfio: vfio: found another in-use device 0000:01:00.1
> > 
> 
> Did you had already a chance to look into it or anything else I can help
> with?

According to the log we're doing the bus reset on both the first and 2nd
boot (it's expected that only the "multi" call gets to success).  I'm
surprised then that the link doesn't retrain back to the original width.
You could try forcing the link to retrain.  Look at the root port
upstream from the GPU, lspci -t is handy for this.  Run lspci on the
root port to get the PCI express capability offset, then use setpci to
set the link retrain bit.  For example:

# lspci -tv | grep NVIDIA
           +-07.0-[03]--+-00.0  NVIDIA Corporation GK106GL [Quadro K4000]
           |            \-00.1  NVIDIA Corporation GK106 HDMI Audio Controller

(upstream root port is 00:07.0)

# lspci -v -s 7.0 | grep Capabilities
        Capabilities: [40] Subsystem: Intel Corporation 5520/5500/X58 I/O Hub 
PCI Express Root Port 7
        Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit-
        Capabilities: [90] Express Root Port (Slot+), MSI 00
        Capabilities: [e0] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Access Control Services
        Capabilities: [160] Vendor Specific Information: ID=0002 Rev=0 Len=00c 
<?>

(PCI express capability is offset 0x90, Link Control is 0x10 off that)

# setpci -s 7.0 a0.w
0040

(retrain is bit 5, 0x20, OR'd with read value is 0x60)

# setpci -s 7.0 a0.w=60

# lspci... did it work?

Try doing that after the first boot to see if you can get back to a x16
link.  If that works, we may need to add something in the kernel to do
it automatically around a bus reset.  Thanks,

Alex

> > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > index 8db182f..7fec259 100644
> > > --- a/hw/misc/vfio.c
> > > +++ b/hw/misc/vfio.c
> > > @@ -2927,6 +2927,10 @@ static bool 
> > > vfio_pci_host_match(PCIHostDeviceAddress *hos
> > >              host1->slot == host2->slot && host1->function == 
> > > host2->function);
> > >  }
> > >  
> > > +#undef DPRINTF
> > > +#define DPRINTF(fmt, ...) \
> > > +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> > > +
> > >  static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
> > >  {
> > >      VFIOGroup *group;
> > > @@ -3104,6 +3108,15 @@ out_single:
> > >      return ret;
> > >  }
> > >  
> > > +#undef DPRINTF
> > > +#ifdef DEBUG_VFIO
> > > +#define DPRINTF(fmt, ...) \
> > > +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> > > +#else
> > > +#define DPRINTF(fmt, ...) \
> > > +    do { } while (0)
> > > +#endif
> > > +
> > >  /*
> > >   * We want to differentiate hot reset of mulitple in-use devices vs hot 
> > > reset
> > >   * of a single in-use device.  VFIO_DEVICE_RESET will already handle the 
> > > case
> > > 
> > > 
> > 
> > --Maik
> > 
> 
> --Maik
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] Multi GPU passthrough via VFIO, (continued)
Prev by Date: Re: [Qemu-devel] Multi GPU passthrough via VFIO
Next by Date: [Qemu-devel] [PATCH] spapr-vlan: flush queue whenever can_receive can go from false to true
Previous by thread: Re: [Qemu-devel] Multi GPU passthrough via VFIO
Next by thread: Re: [Qemu-devel] Multi GPU passthrough via VFIO
Index(es):
- Date
- Thread