qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [Qemu-devel] [RFC PATCH 06/10] spapr_rtas: Add Dynamic DM


From: David Gibson
Subject: Re: [Qemu-ppc] [Qemu-devel] [RFC PATCH 06/10] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support
Date: Fri, 15 Aug 2014 14:20:52 +1000
User-agent: Mutt/1.5.23 (2014-03-12)

On Fri, Aug 15, 2014 at 01:09:20PM +1000, Alexey Kardashevskiy wrote:
> On 08/15/2014 10:04 AM, David Gibson wrote:
> > On Thu, Aug 14, 2014 at 06:29:50PM +1000, Alexey Kardashevskiy wrote:
> >> On 08/13/2014 01:27 PM, David Gibson wrote:
> >>> On Tue, Aug 12, 2014 at 05:25:29PM +1000, Alexey Kardashevskiy wrote:
> >>>> On 08/12/2014 11:45 AM, David Gibson wrote:
> >>>>> On Thu, Jul 31, 2014 at 07:34:10PM +1000, Alexey Kardashevskiy
> >>>> wrote:
> >>> [snip]
> >>>>> The function of this is kind of unclear.  I'm assuming this is
> >>>>> filtering the supported page sizes reported by the PHB by the possible
> >>>>> page sizes based on host page size or other constraints.  Is that
> >>>>> right?
> >>>>>
> >>>>> I think you'd be better off folding the whole double loop into the
> >>>>> fixmask function.
> >>>>>
> >>>>>> +
> >>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>>> +    rtas_st(rets, 1, windows_available);
> >>>>>> +    /* Return maximum number as all RAM was 4K pages */
> >>>>>> +    rtas_st(rets, 2, ram_size >> SPAPR_TCE_PAGE_SHIFT);
> >>>>>
> >>>>> I'm assuming this is the allowed size of the dynamic windows.
> >>>>> Shouldn't that be reported by a PHB callback, rather than hardcoded
> >>>>> here?
> >>>>
> >>>> Why PHB? This is DMA memory. @ram_size is the upper limit, we can make 
> >>>> more
> >>>> only when we have memory hotplug (which we do not have) and the guest can
> >>>> create smaller windows if it wants so I do not really follow you here.
> >>>
> >>> What I'm not clear on is what this RTAS return actually means.  Is it
> >>> saying the maximum size of the DMA window, or the maximum address
> >>> which can be mapped by that window?  Remember I don't have access to
> >>> PAPR documentation any more - nor do others reading these patches.
> >>
> >>
> >> It is literally "Largest contiguous block of TCEs allocated specifically
> >> for (that is, are reserved for) this PE". Which I understand as the maximum
> >> number of TCEs.
> > 
> > Ok, so essentially it's a property of the IOMMU.  Hrm, I guess
> > ram_size is good enough for now then.
> > 
> > [snip]
> >>> [snip]
> >>>>>> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
> >>>>>> +                                          sPAPREnvironment *spapr,
> >>>>>> +                                          uint32_t token, uint32_t 
> >>>>>> nargs,
> >>>>>> +                                          target_ulong args,
> >>>>>> +                                          uint32_t nret, target_ulong 
> >>>>>> rets)
> >>>>>> +{
> >>>>>> +    sPAPRPHBState *sphb;
> >>>>>> +    sPAPRPHBClass *spc;
> >>>>>> +    sPAPRTCETable *tcet = NULL;
> >>>>>> +    uint32_t addr, page_shift, window_shift, liobn;
> >>>>>> +    uint64_t buid;
> >>>>>> +    long ret;
> >>>>>> +
> >>>>>> +    if ((nargs != 5) || (nret != 4)) {
> >>>>>> +        goto param_error_exit;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> >>>>>> +    addr = rtas_ld(args, 0);
> >>>>>> +    sphb = spapr_pci_find_phb(spapr, buid);
> >>>>>> +    if (!sphb) {
> >>>>>> +        goto param_error_exit;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> >>>>>> +    if (!spc->ddw_create) {
> >>>>>> +        goto hw_error_exit;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    page_shift = rtas_ld(args, 3);
> >>>>>> +    window_shift = rtas_ld(args, 4);
> >>>>>> +    liobn = sphb->dma_liobn + 0x10000;
> >>>>>
> >>>>> Isn't using a fixed LIOBN here assuming you can only have a single DDW
> >>>>> per PHB?  That's true for now, but in theory shouldn't it be reported
> >>>>> by the PHB code itself?
> >>>>
> >>>>
> >>>> This should be a unique LIOBN so it is not up to PHB to choose. And we
> >>>> cannot make it completely random for migration purposes. I'll make it
> >>>> something like
> >>>>
> >>>> #define SPAPR_DDW_LIOBN(sphb, windownum) ((sphb)->dma_liobn | windownum)
> >>>
> >>> Ok.
> >>>
> >>> Really, the assigned liobns should be included in the migration stream
> >>> if they're not already.
> >>
> >> LIOBNs already migrate, liobn itself is an instance id of a TCE table
> >> object in the migration stream.
> > 
> > Ok, so couldn't we just add an alloc_liobn() function instead of
> > hardcoding how the liobns are constructed?
> 
> 
> No. If we did so, exact numbers would depend on the device order in the
> QEMU command line - QEMU command line produced by libvirt from handmade XML
> and QEMU command line produced by libvirt from XML printed by "dumpxml" can
> have devices in different order, so interrupt numbers, LIOBNs - all of this
> gets broken.

Ah, duh.  Clearly I'm still swapping back in my knowledge of qemu
migration.

What I was meaning before is that qemu should reconstruct the TCE
tables / liobns based on the info in the migration stream, which
remains true, but it can't really be done without more broadly
addressing the fact that qemu doesn't transmit the hardware
configuration in the migration stream.

Ok, so what should probably be done here is to explicitly assign
fields in the LIOBN to "device type", "device id" (vio reg or PHB
buid), and per-device instance (DDW number for PCI).

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: pgpC5b_RqAfw9.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]