qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Multi GPU passthrough via VFIO


From: Maik Broemme
Subject: Re: [Qemu-devel] Multi GPU passthrough via VFIO
Date: Fri, 7 Feb 2014 21:17:34 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Alex,

Alex Williamson <address@hidden> wrote:
> On Fri, 2014-02-07 at 01:22 +0100, Maik Broemme wrote:
> > Interesting is the diff between 1st and 2nd boot, so if I do the lspci
> > prior to the booting. The only difference between 1st start and 2nd
> > start are:
> > 
> > --- 001-lspci.290x.before.1st.log   2014-02-07 01:13:41.498827928 +0100
> > +++ 004-lspci.290x.before.2nd.log   2014-02-07 01:16:50.966611282 +0100
> > @@ -24,7 +24,7 @@
> >                     ClockPM- Surprise- LLActRep- BwNot-
> >             LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >                     ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > -           LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ 
> > DLActive- BWMgmt- ABWMgmt-
> > +           LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
> > DLActive- BWMgmt- ABWMgmt-
> >             DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, 
> > OBFF Not Supported
> >             DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
> > OBFF Disabled
> >             LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > @@ -33,13 +33,13 @@
> >             LnkSta2: Current De-emphasis Level: -3.5dB, 
> > EqualizationComplete-, EqualizationPhase1-
> >                      EqualizationPhase2-, EqualizationPhase3-, 
> > LinkEqualizationRequest-
> >     Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> > -           Address: 0000000000000000  Data: 0000
> > +           Address: 00000000fee00000  Data: 0000
> >     Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 
> > Len=010 <?>
> >     Capabilities: [150 v2] Advanced Error Reporting
> >             UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
> > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >             UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
> > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >             UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
> > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > -           CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> > +           CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> >             CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> >             AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> >     Capabilities: [270 v1] #19
> > 
> > After that if I do suspend-to-ram / resume trick I have again lspci
> > output from before 1st boot.
> 
> The Link Status change after X is stopped seems the most interesting to
> me.  The MSI change is probably explained by the MSI save/restore of the
> device, but should be harmless since MSI is disabled.  I'm a bit
> surprised the Correctable Error Status in the AER capability didn't get
> cleared.  I would have thought that a bus reset would have caused the
> link to retrain back to the original speed/width as well.  Let's check
> that we're actually getting a bus reset, try this in addition to the
> previous qemu patch.  This just enables debug logging for the bus resest
> function.  Thanks,
> 

Below are the outputs from 2 boots, VGA, load fglrx and start X. (2nd
time X gets killed and oops happened)

- 1st boot:

vfio: vfio_pci_hot_reset(0000:01:00.1) multi
vfio: 0000:01:00.1: hot reset dependent devices:
vfio:   0000:01:00.0 group 1
vfio:   0000:01:00.1 group 1
vfio: 0000:01:00.1 hot reset: Success
vfio: vfio_pci_hot_reset(0000:01:00.1) one
vfio: 0000:01:00.1: hot reset dependent devices:
vfio:   0000:01:00.0 group 1
vfio: vfio: found another in-use device 0000:01:00.0
vfio: vfio_pci_hot_reset(0000:01:00.0) one
vfio: 0000:01:00.0: hot reset dependent devices:
vfio:   0000:01:00.0 group 1
vfio:   0000:01:00.1 group 1
vfio: vfio: found another in-use device 0000:01:00.1

- 2nd boot:

vfio: vfio_pci_hot_reset(0000:01:00.1) multi
vfio: 0000:01:00.1: hot reset dependent devices:
vfio:   0000:01:00.0 group 1
vfio:   0000:01:00.1 group 1
vfio: 0000:01:00.1 hot reset: Success
vfio: vfio_pci_hot_reset(0000:01:00.1) one
vfio: 0000:01:00.1: hot reset dependent devices:
vfio:   0000:01:00.0 group 1
vfio: vfio: found another in-use device 0000:01:00.0
vfio: vfio_pci_hot_reset(0000:01:00.0) one
vfio: 0000:01:00.0: hot reset dependent devices:
vfio:   0000:01:00.0 group 1
vfio:   0000:01:00.1 group 1
vfio: vfio: found another in-use device 0000:01:00.1

> Alex
> 
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index 8db182f..7fec259 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -2927,6 +2927,10 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress 
> *hos
>              host1->slot == host2->slot && host1->function == 
> host2->function);
>  }
>  
> +#undef DPRINTF
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> +
>  static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
>  {
>      VFIOGroup *group;
> @@ -3104,6 +3108,15 @@ out_single:
>      return ret;
>  }
>  
> +#undef DPRINTF
> +#ifdef DEBUG_VFIO
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
>  /*
>   * We want to differentiate hot reset of mulitple in-use devices vs hot reset
>   * of a single in-use device.  VFIO_DEVICE_RESET will already handle the case
> 
> 

--Maik



reply via email to

[Prev in Thread] Current Thread [Next in Thread]