qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH qemu v16 19/19] spapr_pci/spapr_pci_vfio: Suppor


From: Bharata B Rao
Subject: Re: [Qemu-devel] [PATCH qemu v16 19/19] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
Date: Wed, 1 Jun 2016 09:02:30 +0530

On Fri, May 27, 2016 at 11:19 AM, Bharata B Rao <address@hidden> wrote:
> On Fri, May 27, 2016 at 10:14 AM, David Gibson
> <address@hidden> wrote:
>> On Tue, May 17, 2016 at 11:02:48AM +0530, Bharata B Rao wrote:
>>> On Mon, May 16, 2016 at 11:55 AM, Alexey Kardashevskiy <address@hidden> 
>>> wrote:
>>> > On 05/13/2016 06:41 PM, Bharata B Rao wrote:
>>> >>
>>> >> On Wed, May 4, 2016 at 12:22 PM, Alexey Kardashevskiy <address@hidden>
>>> >> wrote:
>>> >
>>> >
>>> >>
>>> >>> +
>>> >>> +    avail = SPAPR_PCI_DMA_MAX_WINDOWS -
>>> >>> spapr_phb_get_active_win_num(sphb);
>>> >>> +
>>> >>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> >>> +    rtas_st(rets, 1, avail);
>>> >>> +    rtas_st(rets, 2, max_window_size);
>>> >>> +    rtas_st(rets, 3, pgmask);
>>> >>> +    rtas_st(rets, 4, 0); /* DMA migration mask, not supported */
>>> >>> +
>>> >>> +    trace_spapr_iommu_ddw_query(buid, addr, avail, max_window_size,
>>> >>> pgmask);
>>> >>> +    return;
>>> >>> +
>>> >>> +param_error_exit:
>>> >>> +    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>> >>> +}
>>> >>> +
>>> >>> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
>>> >>> +                                          sPAPRMachineState *spapr,
>>> >>> +                                          uint32_t token, uint32_t
>>> >>> nargs,
>>> >>> +                                          target_ulong args,
>>> >>> +                                          uint32_t nret, target_ulong
>>> >>> rets)
>>> >>> +{
>>> >>> +    sPAPRPHBState *sphb;
>>> >>> +    sPAPRTCETable *tcet = NULL;
>>> >>> +    uint32_t addr, page_shift, window_shift, liobn;
>>> >>> +    uint64_t buid;
>>> >>> +
>>> >>> +    if ((nargs != 5) || (nret != 4)) {
>>> >>> +        goto param_error_exit;
>>> >>> +    }
>>> >>> +
>>> >>> +    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
>>> >>> +    addr = rtas_ld(args, 0);
>>> >>> +    sphb = spapr_pci_find_phb(spapr, buid);
>>> >>> +    if (!sphb || !sphb->ddw_enabled) {
>>> >>> +        goto param_error_exit;
>>> >>> +    }
>>> >>> +
>>> >>> +    page_shift = rtas_ld(args, 3);
>>> >>> +    window_shift = rtas_ld(args, 4);
>>> >>
>>> >>
>>> >> Kernel has a bug due to which wrong window_shift gets returned here. I
>>> >> have posted possible fix here:
>>> >> https://patchwork.ozlabs.org/patch/621497/
>>> >>
>>> >> I have tried to work around this issue in QEMU too
>>> >> https://lists.nongnu.org/archive/html/qemu-ppc/2016-04/msg00226.html
>>> >>
>>> >> But the above work around involves changing the memory representation
>>> >> in DT.
>>> >
>>> >
>>> > What is wrong with this workaround?
>>>
>>> The above workaround will result in different representations for
>>> memory in DT before and after the workaround.
>>>
>>> Currently for -m 2G, -numa node,nodeid=0,mem=1G -numa
>>> node,nodeid=1,mem=0.5G, we will have the following nodes in DT:
>>>
>>> address@hidden
>>> address@hidden
>>> ibm,dynamic-reconfiguration-memory
>>>
>>> ibm,dynamic-memory will have only DR LMBs:
>>>
>>> address@hidden ibm,dynamic-reconfiguration-memory]# hexdump 
>>> ibm,dynamic-memory
>>> 0000000 0000 000a 0000 0000 8000 0000 8000 0008
>>> 0000010 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 0000020 9000 0000 8000 0009 0000 0000 ffff ffff
>>> 0000030 0000 0000 0000 0000 a000 0000 8000 000a
>>> 0000040 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 0000050 b000 0000 8000 000b 0000 0000 ffff ffff
>>> 0000060 0000 0000 0000 0000 c000 0000 8000 000c
>>> 0000070 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 0000080 d000 0000 8000 000d 0000 0000 ffff ffff
>>> 0000090 0000 0000 0000 0000 e000 0000 8000 000e
>>> 00000a0 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 00000b0 f000 0000 8000 000f 0000 0000 ffff ffff
>>> 00000c0 0000 0000 0000 0001 0000 0000 8000 0010
>>> 00000d0 0000 0000 ffff ffff 0000 0000 0000 0001
>>> 00000e0 1000 0000 8000 0011 0000 0000 ffff ffff
>>> 00000f0 0000 0000
>>>
>>> The memory region looks like this:
>>>
>>> memory-region: system
>>>   0000000000000000-ffffffffffffffff (prio 0, RW): system
>>>     0000000000000000-000000005fffffff (prio 0, RW): ppc_spapr.ram
>>>     0000000080000000-000000011fffffff (prio 0, RW): hotplug-memory
>>>
>>> After this workaround, all this will change like below:
>>>
>>> address@hidden
>>> ibm,dynamic-reconfiguration-memory
>>>
>>> All LMBs in ibm,dynamic-memory:
>>>
>>> address@hidden ibm,dynamic-reconfiguration-memory]# hexdump 
>>> ibm,dynamic-memory
>>>
>>> 0000000 0000 0010 0000 0000 0000 0000 8000 0000
>>> 0000010 0000 0000 0000 0000 0000 0080 0000 0000
>>> 0000020 1000 0000 8000 0001 0000 0000 0000 0000
>>> 0000030 0000 0080 0000 0000 2000 0000 8000 0002
>>> 0000040 0000 0000 0000 0000 0000 0080 0000 0000
>>> 0000050 3000 0000 8000 0003 0000 0000 0000 0000
>>> 0000060 0000 0080 0000 0000 4000 0000 8000 0004
>>> 0000070 0000 0000 0000 0001 0000 0008 0000 0000
>>> 0000080 5000 0000 8000 0005 0000 0000 0000 0001
>>> 0000090 0000 0008 0000 0000 6000 0000 8000 0006
>>> 00000a0 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 00000b0 7000 0000 8000 0007 0000 0000 ffff ffff
>>> 00000c0 0000 0000 0000 0000 8000 0000 8000 0008
>>> 00000d0 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 00000e0 9000 0000 8000 0009 0000 0000 ffff ffff
>>> 00000f0 0000 0000 0000 0000 a000 0000 8000 000a
>>> 0000100 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 0000110 b000 0000 8000 000b 0000 0000 ffff ffff
>>> 0000120 0000 0000 0000 0000 c000 0000 8000 000c
>>> 0000130 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 0000140 d000 0000 8000 000d 0000 0000 ffff ffff
>>> 0000150 0000 0000 0000 0000 e000 0000 8000 000e
>>> 0000160 0000 0000 ffff ffff 0000 0000 0000 0000
>>> 0000170 f000 0000 8000 000f 0000 0000 ffff ffff
>>> 0000180 0000 0000
>>>
>>> Hotplug memory region gets a new address range now:
>>>
>>> memory-region: system
>>>   0000000000000000-ffffffffffffffff (prio 0, RW): system
>>>     0000000000000000-000000005fffffff (prio 0, RW): ppc_spapr.ram
>>>     0000000060000000-00000000ffffffff (prio 0, RW): hotplug-memory
>>>
>>>
>>> So when a guest that was booted with older QEMU is migrated to a newer
>>> QEMU that has this workaround, then it will start exhibiting the above
>>> changes after first reboot post migration.
>>
>> Ok.. why is that bad?
>>
>>> If user has done memory hotplug by explicitly specifying address in
>>> the source, then even migration would fail because the addr specified
>>> at the target will not be part of hotplug-memory range.
>>
>> Sorry, not really following the situation you're describing here.
>
> If the original case where the hotplug region was this:
> 0000000080000000-000000011fffffff (prio 0, RW): hotplug-memory
>
> one could hoplug a DIMM at a specified address like this:
>
> (qemu) object_add memory-backend-ram,id=ram0,size=256M
> (qemu) device_add pc-dimm,id=dimm0,memdev=ram0,addr=0x100000000
> (qemu) info mtree
> 0000000080000000-000000011fffffff (prio 0, RW): hotplug-memory
>       0000000100000000-000000010fffffff (prio 0, RW): ram0
>
> Now if this guest has to be migrated to a target where we have this
> workaround enabled, then the target QEMU started with
>
> -incoming ... -object memory-backend-ram,id=ram0,size=256M -device
> pc-dimm,id=dimm0,memdev=ram0,addr=0x100000000
>
> will fail because addr=0x100000000 isn't part of the hotplug-memory
> region at the target.

And I verified that libvirt indeed always updates XML with slot and
addr explicitly for the DIMM device and the same is used at the target
during migration even when user hasn't explicitly specified slot or
addr when hotplugging memory DIMM. So when addr is being used
explicitly like this, any change in hotplug memory region layout will
break migration.

So is that a good enough reason to put the work around in the DDW code itself ?

Regards,
Bharata.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]