qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] iommu emulation


From: Jintack Lim
Subject: Re: [Qemu-devel] iommu emulation
Date: Wed, 15 Feb 2017 18:25:26 -0500

On Wed, Feb 15, 2017 at 5:50 PM, Alex Williamson <address@hidden
> wrote:

> On Wed, 15 Feb 2017 17:05:35 -0500
> Jintack Lim <address@hidden> wrote:
>
> > On Tue, Feb 14, 2017 at 9:52 PM, Peter Xu <address@hidden> wrote:
> >
> > > On Tue, Feb 14, 2017 at 07:50:39AM -0500, Jintack Lim wrote:
> > >
> > > [...]
> > >
> > > > > > >> > I misunderstood what you said?
> > > > > > >
> > > > > > > I failed to understand why an vIOMMU could help boost
> performance.
> > > :(
> > > > > > > Could you provide your command line here so that I can try to
> > > > > > > reproduce?
> > > > > >
> > > > > > Sure. This is the command line to launch L1 VM
> > > > > >
> > > > > > qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \
> > > > > > -m 12G -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> > > > > > -drive file=/mydata/guest0.img,format=raw --nographic -cpu host
> \
> > > > > > -smp 4,sockets=4,cores=1,threads=1 \
> > > > > > -device vfio-pci,host=08:00.0,id=net0
> > > > > >
> > > > > > And this is for L2 VM.
> > > > > >
> > > > > > ./qemu-system-x86_64 -M q35,accel=kvm \
> > > > > > -m 8G \
> > > > > > -drive file=/vm/l2guest.img,format=raw --nographic -cpu host \
> > > > > > -device vfio-pci,host=00:03.0,id=net0
> > > > >
> > > > > ... here looks like these are command lines for L1/L2 guest, rather
> > > > > than L1 guest with/without vIOMMU?
> > > > >
> > > >
> > > > That's right. I thought you were asking about command lines for L1/L2
> > > guest
> > > > :(.
> > > > I think I made the confusion, and as I said above, I didn't mean to
> talk
> > > > about the performance of L1 guest with/without vIOMMO.
> > > > We can move on!
> > >
> > > I see. Sure! :-)
> > >
> > > [...]
> > >
> > > > >
> > > > > Then, I *think* above assertion you encountered would fail only if
> > > > > prev == 0 here, but I still don't quite sure why was that
> happening.
> > > > > Btw, could you paste me your "lspci -vvv -s 00:03.0" result in
> your L1
> > > > > guest?
> > > > >
> > > >
> > > > Sure. This is from my L1 guest.
> > >
> > > Hmm... I think I found the problem...
> > >
> > > >
> > > > address@hidden:~# lspci -vvv -s 00:03.0
> > > > 00:03.0 Network controller: Mellanox Technologies MT27500 Family
> > > > [ConnectX-3]
> > > > Subsystem: Mellanox Technologies Device 0050
> > > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > > > Stepping- SERR+ FastB2B- DisINTx+
> > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort-
> > > > <MAbort- >SERR- <PERR- INTx-
> > > > Latency: 0, Cache Line Size: 64 bytes
> > > > Interrupt: pin A routed to IRQ 23
> > > > Region 0: Memory at fe900000 (64-bit, non-prefetchable) [size=1M]
> > > > Region 2: Memory at fe000000 (64-bit, prefetchable) [size=8M]
> > > > Expansion ROM at fea00000 [disabled] [size=1M]
> > > > Capabilities: [40] Power Management version 3
> > > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-
> > > )
> > > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > > Capabilities: [48] Vital Product Data
> > > > Product Name: CX354A - ConnectX-3 QSFP
> > > > Read-only fields:
> > > > [PN] Part number: MCX354A-FCBT
> > > > [EC] Engineering changes: A4
> > > > [SN] Serial number: MT1346X00791
> > > > [V0] Vendor specific: PCIe Gen3 x8
> > > > [RV] Reserved: checksum good, 0 byte(s) reserved
> > > > Read/write fields:
> > > > [V1] Vendor specific: N/A
> > > > [YA] Asset tag: N/A
> > > > [RW] Read-write area: 105 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 253 byte(s) free
> > > > [RW] Read-write area: 252 byte(s) free
> > > > End
> > > > Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
> > > > Vector table: BAR=0 offset=0007c000
> > > > PBA: BAR=0 offset=0007d000
> > > > Capabilities: [60] Express (v2) Root Complex Integrated Endpoint,
> MSI 00
> > > > DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > > ExtTag- RBE+
> > > > DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
> > > > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > > MaxPayload 256 bytes, MaxReadReq 4096 bytes
> > > > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
> > > > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not
> > > > Supported
> > > > DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF
> > > Disabled
> > > > Capabilities: [100 v0] #00
> > >
> > > Here we have the head of ecap capability as cap_id==0, then when we
> > > boot the l2 guest with the same device, we'll first copy this
> > > cap_id==0 cap, then when adding the 2nd ecap, we'll probably encounter
> > > problem since pcie_find_capability_list() will thought there is no cap
> > > at all (cap_id==0 is skipped).
> > >
> > > Do you want to try this "hacky patch" to see whether it works for you?
> > >
> >
> > Thanks for following this up!
> >
> > I just tried this, and I got some different message this time.
> >
> > qemu-system-x86_64: vfio: Cannot reset device 0000:00:03.0, no available
> > reset mechanism.
> > qemu-system-x86_64: vfio: Cannot reset device 0000:00:03.0, no available
> > reset mechanism.
>
> Possibly very true, it might affect the reliability of the device in
> the l2 guest, but shouldn't prevent it from being assigned.  What's the
> reset mechanism on the physical device (lspci -vvv from host please).
>

Thanks, Alex.
This is from the host (L0).

08:00.0 Network controller: Mellanox Technologies MT27500 Family
[ConnectX-3]
Subsystem: Mellanox Technologies Device 0050
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 31
Region 0: Memory at d9f00000 (64-bit, non-prefetchable) [disabled] [size=1M]
Region 2: Memory at d5000000 (64-bit, prefetchable) [disabled] [size=8M]
Expansion ROM at d9000000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: CX354A - ConnectX-3 QSFP
Read-only fields:
[PN] Part number: MCX354A-FCBT
[EC] Engineering changes: A4
[SN] Serial number: MT1346X00624
[V0] Vendor specific: PCIe Gen3 x8
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 105 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 252 byte(s) free
End
Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s
unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt-
ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not
Supported
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+,
EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [148 v1] Device Serial Number f4-52-14-03-00-15-51-10
Capabilities: [154 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [18c v1] #19
Kernel driver in use: vfio-pci


Thanks,
>
> Alex
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]