qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v3 03/35] ppc/xive: introduce the XiveFabric inter


From: David Gibson
Subject: Re: [Qemu-ppc] [PATCH v3 03/35] ppc/xive: introduce the XiveFabric interface
Date: Thu, 3 May 2018 15:13:33 +1000
User-agent: Mutt/1.9.3 (2018-01-21)

On Wed, May 02, 2018 at 05:28:23PM +0200, Cédric Le Goater wrote:
> On 04/27/2018 08:32 AM, David Gibson wrote:
> > On Thu, Apr 26, 2018 at 12:30:42PM +0200, Cédric Le Goater wrote:
> >> On 04/26/2018 05:54 AM, David Gibson wrote:
> >>> On Tue, Apr 24, 2018 at 11:33:11AM +0200, Cédric Le Goater wrote:
> >>>> On 04/24/2018 08:46 AM, David Gibson wrote:
> >>>>> On Mon, Apr 23, 2018 at 09:58:43AM +0200, Cédric Le Goater wrote:
> >>>>>> On 04/23/2018 08:46 AM, David Gibson wrote:
> >>>>>>> On Thu, Apr 19, 2018 at 02:42:59PM +0200, Cédric Le Goater wrote:
> >>>>>>>> The XiveFabric offers a simple interface, between the XiveSourve
> >>>>>>>> object and the device model owning the interrupt sources, to forward
> >>>>>>>> an event notification to the XIVE interrupt controller of the machine
> >>>>>>>> and if the owner is the controller, to call directly the routing
> >>>>>>>> sub-engine.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
> >>>>>>>> ---
> >>>>>>>>  hw/intc/xive.c        | 37 ++++++++++++++++++++++++++++++++++++-
> >>>>>>>>  include/hw/ppc/xive.h | 25 +++++++++++++++++++++++++
> >>>>>>>>  2 files changed, 61 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> >>>>>>>> index 060976077dd7..b4c3d06c1219 100644
> >>>>>>>> --- a/hw/intc/xive.c
> >>>>>>>> +++ b/hw/intc/xive.c
> >>>>>>>> @@ -17,6 +17,21 @@
> >>>>>>>>  #include "hw/ppc/xive.h"
> >>>>>>>>  
> >>>>>>>>  /*
> >>>>>>>> + * XIVE Fabric
> >>>>>>>> + */
> >>>>>>>> +
> >>>>>>>> +static void xive_fabric_route(XiveFabric *xf, int lisn)
> >>>>>>>> +{
> >>>>>>>> +
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static const TypeInfo xive_fabric_info = {
> >>>>>>>> +    .name = TYPE_XIVE_FABRIC,
> >>>>>>>> +    .parent = TYPE_INTERFACE,
> >>>>>>>> +    .class_size = sizeof(XiveFabricClass),
> >>>>>>>> +};
> >>>>>>>> +
> >>>>>>>> +/*
> >>>>>>>>   * XIVE Interrupt Source
> >>>>>>>>   */
> >>>>>>>>  
> >>>>>>>> @@ -97,11 +112,19 @@ static bool xive_source_pq_trigger(XiveSource 
> >>>>>>>> *xsrc, uint32_t srcno)
> >>>>>>>>  
> >>>>>>>>  /*
> >>>>>>>>   * Forward the source event notification to the associated 
> >>>>>>>> XiveFabric,
> >>>>>>>> - * the device owning the sources.
> >>>>>>>> + * the device owning the sources, or perform the routing if the 
> >>>>>>>> device
> >>>>>>>> + * is the interrupt controller.
> >>>>>>>>   */
> >>>>>>>>  static void xive_source_notify(XiveSource *xsrc, int srcno)
> >>>>>>>>  {
> >>>>>>>>  
> >>>>>>>> +    XiveFabricClass *xfc = XIVE_FABRIC_GET_CLASS(xsrc->xive);
> >>>>>>>> +
> >>>>>>>> +    if (xfc->notify) {
> >>>>>>>> +        xfc->notify(xsrc->xive, srcno + xsrc->offset);
> >>>>>>>> +    } else {
> >>>>>>>> +        xive_fabric_route(xsrc->xive, srcno + xsrc->offset);
> >>>>>>>> +    }
> >>>>>>>
> >>>>>>> Why 2 cases?  Can't the XiveFabric object just make its notify equal
> >>>>>>> to xive_fabric_route if that's what it wants?
> >>>>>> Under sPAPR, all the sources, IPIs and virtual device interrupts, 
> >>>>>> generate events which are directly routed by xive_fabric_route(). 
> >>>>>> There is no need of an extra hop. Indeed. 
> >>>>>
> >>>>> Ok.
> >>>>>
> >>>>>> Under PowerNV, some sources forward the notification to the routing 
> >>>>>> engine using a specific MMIO load on a notify address which is stored 
> >>>>>> in one of the controller registers. So we need a hop to reach the 
> >>>>>> device model, owning the sources, and do that load :
> >>>>>
> >>>>> Hm.  So you're saying that in pnv some sources send their notification
> >>>>> to some other unit, 
> >>>>
> >>>> Not to any unit/device, to the device owning the sources.
> >>>>
> >>>> For the XiveSource object under PSI, the XIVEFabric interface is the 
> >>>> PSI device object it self, which knows how to forward the notification 
> >>>> on the XIVE Power "bus". To be more precise, the PSI HB device has 
> >>>> 14 interrupt sources, which notifications are forwarded using a MMIO 
> >>>> load to some address. The load address is configured (by skiboot) in 
> >>>> one of the PSI device registers, and points to a MMIO region of the 
> >>>> main XIVE interrupt controller. 
> >>>>
> >>>> The PHB4 sources should be the same.
> >>>>
> >>>> For the XiveSource object (all interrupts) under sPAPRXive, the 
> >>>> XIVEFabric is the main interrupt controller sPAPRXive.
> >>>>
> >>>> For the XiveSource object (IPIs) under PnvXive, the XIVEFabric is 
> >>>> also the main interrupt controller PnvXive.
> >>>
> >>> Hrm.  Apparently I'm missing something, I'm really not getting what
> >>> you're trying to explain here.
> >>
> >> I see that. Let's try again.
> >>
> >>>>> that would then (after possible masking) forward on to the overall> 
> >>>>> xive fabric ? 
> >>>>
> >>>> yes. May be XIVEFabric is a confusing name. What about XIVEForwarder ? 
> >>>
> >>> Maybe..?
> >>>
> >>>>> That seems like a property of the source object, 
> >>>>
> >>>> The source object is generic. It's a bunch of PQ bits that can be 
> >>>> controlled by MMIOs. Nothing more.
> >>>
> >>> Hmm.  Isn't the source object also responsible for forwarding the
> >>> interrupt to something up the chain (whatever that is)?
> >>
> >> Yes but it can not forward directly. The XiveSource is generic and 
> >> can only call a handler :
> >>
> >>    xfc->notify(xsrc->xive, srcno + xsrc->offset);
> > 
> > But.. your patch doesn't do that always, it's conditional which I
> > still don't understand.
> 
> Because at the end of the notify/forward chain, you route.

Hrm.  I'm really not understanding this notify/forward thing as
distinct from routing.  I mean, from the description here it sounds
kind of like cascaded interrupt controllers, which qemu already has
mechanisms to handle, but I didn't think (most) POWER9 devices worked
like that.

> >> The device model owner, the parent of the XiveSource object, would 
> >> do the real forward.
> > 
> > Why?  
> 
> because, in my idea of the XiveSource concept, it does not have 
> the logic to do so: the register for the MMIO address to use, 
> another one for the IVT offset, etc
> 
> > I mean the XiveSource basically represents the xive irq related
> > logic of the PHB or whatever, why would it not represent *all* of
> > that, rather than just the ESB bits, meaning the owner has to have
> > some more xive logic for the forwarding.
> 
> ok. This is where we diverge in the concept. 
> 
> The PQ bits, the ESB MMIO region handlers can be easily shared 
> between the different device models. They are the common part
> of devices with XIVE interrupt sources.

Sure, but QOM subclasses are a thing, which sounds like a better way
of doing this than having to have extra XIVE logic in all the parent
devices.

> > Note that I don't think the fact that some sources notify via mmio and
> > some are internal really matters.  It's not like we're modelling the
> > power bus down to the wire-transaction level.
> 
> yes but the configuration of the devices are different. pnv devices 
> will have registers accessible through MMIO or XSCOM and configured
> by the firmware. spapr is all set up by QEMU.

Ah.. ok.  Actually modelling the mmio forwards probably makes sense
then.  I still think it makes more sense in a pnv XiveSource subclass
rather than putting it in the containing device.

> >> It's very similar to what we have today with XICS :
> >>
> >>    - The sPAPR model has an ICSState  
> >>    - The PnvPSI model has an ICSState 
> >>    - The PnvPHB3 model has two ICSStates
> >>
> >> and the 'xics' pointer in ICSState points to the 'interrupt unit' of 
> >> the machine to do resends and to grab ICPs. So it used for routing 
> >> essentially.
> > 
> > Hmm.  I think you and I are looking at XICSFabric kind of
> > differently.  As I see it, it's not really an active component at
> > all.  Rather it's basically a global "map" of the xics components so
> > that they can find each other.
> 
> ok. I am not that far either. 
>  
> >> in Xive 
> >>
> >>    - sPAPRXive model has a XiveSource
> >>    - PnvXive model has a XiveSource
> >>    - PnvPSI model has a XiveSource
> >>    - PnvPHB4 model should have also.
> >>
> >> and the 'xive' pointer in XiveSource points to the parent object,
> > 
> > Uh.. yeah.. the xics pointer in ICS units doesn't point to the parent
> > object, except maybe by accident.  It's absolutely intended to be
> > global, and so points to the machine.
> 
> yes. I agree. 
> 
> XIVE has more layers and visible components due to the internal 
> tables used for routing.

Right.

> >> which will handle the event notification forwarding or routing.
> > 
> > Ok, how about this for a partial model.  We have:
> > 
> > XiveSource objects:
> >     * Owns an ESB table
> >     * Knows the mapping of its local irq offsets to global irq
> >       numbers
> 
> That is the 'offset' attribute I suppose. This is set at runtime 
> for the powernv devices.

Ok, so that offset is effectively a register (which will have to be
migrated), rather than a device property.

> For pseries, we should need it for 
> passthrough. I think. I haven't looked at that part yet.

Uh.. I really hope not.  AIUI the offsets are decided by the platform
rather than the guest in this case, yes?  In which case if we can't
count on a fixed offset even in passthrough mode, then migration is
basically impossible.

> >     * Provides the mmio interface for ESB manipulation
> >     * When neccessary, notifies a new interrupt to a XiveRouter
> 
> ok. I think that what we have today fits the idea then. 

Yes, I think so to - just clarifying in the context of the rest of
this proposal.

> > XiveRouter objects:
> >     * Responsible for a fixed range of global irq numbers
> >     * Owns an IVT (but what that means can vary, see below)
> 
> the size of the IVT table is determined at runtime for powernv.

That's fine - both the size and base of the IVT will be registers of
the powernv variant of the device.

> >     * When notified of an irq, routes it to the appropriate EQ
> >       (haven't thought about this part yet)
> 
> we will need a class handler to get EQs

Sure.

> >     * Abstract class - needs subclasses to define how to get IVEs
> 
> OK. We will need a few other ops. The router needs to :
> 
>       - get IVEs
>       - get EQ descriptors
>       - update OS EQs (write event data in OS RAM)

IIUC the actual EQ (as opposed to its descriptor) will be in guest RAM
for both powernv and spapr, so this can be common (utiliizing a
subclass hook to locate the EQ base address).

>       - update EQ descriptors (to set EQ descriptor index & toggle)
>       - get an NVT/VP (to notify CPUs)

Getting NVT/VP sounds more like it would belong in the XiveFabric than
the router, but I haven't looked at this in detail yet.

> For powernv, we should also consider updating the NVT/VP. It can be 
> done later. 
> 
> > XiveFabric interface:
> >     * Lets XIVE components locate each other
> 
> hmm, it should be a chain : 
> 
>       source -> router -> presenter -> cpu
> 
> So the components should not have to locate each other. The presenter
> does not know about the source for instance. Only the sources need
> to forward events to the main controller logic doing the routing.

source -> router

From the above sounds like we can maybe make this just a router
property in the source, rather than a lookup through a fabric.

router -> presenter

Here we might still need a "fabric".  By defintion the router can
direct to a bunch of different presenters, so it needs some kind of
map to find them, no?

presenter -> cpu

This one's trivial; just the cpu->intc pointer or similar.

> >     * get_router() method: maps a global irq number to XiveRouter
> >       object
> 
> Ah. you are thinking about the multichip case under powernv. I need
> to look at that more closely. But XIVE has a concept of block which 
> is used by skiboot to map a chip to a block and the XIVE tables have 
> a block field.

Hmm.. I think we need to understand this to make a real model.  I'd
thought the block number would just be the chip number somewhere in
the high bits of the global irq, but sounds like there might be yet
another indirection here.

> >     * Always global (implemented on the machine)
> 
> OK. this is more or less the object modeling the main interrupt 
> controller ? What I called sPAPRXive in the current patchset.

No.  This is *always* global - assuming we need it at all - even on
multichip powernv.

> > On pseries we have:
> > 
> >     ? XiveSource objects.  We probably only need one, but 1 for
> >     LSI and one for MSI might be convenient.  
> 
> It's not too ugly for the moment. If we create a source object 
> for LSIs only that might be more complex for passthrough devices 
> and their associated ESB MMIO region.
> 
> side note :
> 
> LSIs work under TCG, and used to work under KVM until we removed
> the StoreEOI support. Since the EOI is now different, we need 
> to find a way to handle EOI for guest virtual LSIs which are 
> not LSIs for the host.

I can't quite picture that case - can you give a concrete example?

> We can still change the Linux spapr 
> backend or have QEMU handle the EOI for virtual LSIs. This is 
> on my TODO list.
> 
> >       More wouldn't break the model
> 
> It should not. Initially the pachset had two source objects : 
> one for IPIs and one for the virtual devices interrupts. 
> 
> >     1 sPAPRXiveRouter.  This is a subclass of XiveRouter that
> >     holds the IVT internall (and migrates it).
> 
> OK.
> 
> >     the XiveFabric implementation always returns the single global
> >     router for get_router()
> 
> we can do that. 

Again, if it's true that each source object always forwards to just
one router, then we don't need a get_router(), we can just use a link
property.

> Do we gather all XIVE components objects under an object 'sPAPRXive' 
> modeling that way the main interrupt controller of the machine ? 

No, I don't think so.

> It should not have any state but it will hold the TIMA region 
> most certainly, and the addresses where to map the ESB and 
> TIMA regions.

Uh.. I'd expect the TIMAs to be held by the NVT objects, which we
haven't covered yet.

> KVM support will bring extra needs.
>  
> > On powernv we have:
> >     N XiveSource objects.  Some in PHBs, some extra ones on each
> >     chip
> 
> yes and PSI to start with.  
> 
> >     (#chips) PowerXiveRouter objects.  This subclass of XiveRouter
> >     stores the register giving the IVT base address and migrates
> >     that, but the IVT contents are in RAM
> > 
> >     the XiveFabric get_router() implementation returns the right
> >     chip's router based on the irq number
> > 
> > Obviously the router->EQ sides still needs a bunch of thought.
> 
> These are all routing tables : 
> 
>       - IVT
>       - EQDT
>       - VPDT  

Thinking about what you've said, I'm thinking maybe what we need on
the source side is two versions which don't exactly correspond to
spapr vs powernv:

InBandXiveSource: this one's notify behaviour is to issue an mmio read
to a configured address.  It's expected that mmio address belongs to a
XiveRouter, but the source itself doesn't know anything routers.

OutOfBandXiveSource: this one's notify behaviour is to explicitly poke
a XiveRouter.  A link to the router and offset (which range of router
irqs are used by this source) would be object properties.

powernv would use the first for "external" irq sources and the second
for internal ones.  spapr would use the second for everything.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]