qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v3 07/35] spapr/xive: introduce the XIVE Event Que


From: Cédric Le Goater
Subject: Re: [Qemu-ppc] [PATCH v3 07/35] spapr/xive: introduce the XIVE Event Queues
Date: Wed, 9 May 2018 10:01:14 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 05/05/2018 06:29 AM, David Gibson wrote:
> On Fri, May 04, 2018 at 03:29:02PM +0200, Cédric Le Goater wrote:
>> On 05/04/2018 07:19 AM, David Gibson wrote:
>>> On Thu, May 03, 2018 at 04:37:29PM +0200, Cédric Le Goater wrote:
>>>> On 05/03/2018 08:25 AM, David Gibson wrote:
>>>>> On Thu, May 03, 2018 at 08:07:54AM +0200, Cédric Le Goater wrote:
>>>>>> On 05/03/2018 07:45 AM, David Gibson wrote:
>>>>>>> On Thu, Apr 26, 2018 at 11:48:06AM +0200, Cédric Le Goater wrote:
>>>>>>>> On 04/26/2018 09:25 AM, David Gibson wrote:
>>>>>>>>> On Thu, Apr 19, 2018 at 02:43:03PM +0200, Cédric Le Goater wrote:
>>>>>>>>>> The Event Queue Descriptor (EQD) table is an internal table of the
>>>>>>>>>> XIVE routing sub-engine. It specifies on which Event Queue the event
>>>>>>>>>> data should be posted when an exception occurs (later on pulled by 
>>>>>>>>>> the
>>>>>>>>>> OS) and which Virtual Processor to notify.
>>>>>>>>>
>>>>>>>>> Uhhh.. I thought the IVT said which queue and vp to notify, and the
>>>>>>>>> EQD gave metadata for event queues.
>>>>>>>>
>>>>>>>> yes. the above poorly written. The Event Queue Descriptor contains the
>>>>>>>> guest address of the event queue in which the data is written. I will 
>>>>>>>> rephrase.      
>>>>>>>>
>>>>>>>> The IVT contains IVEs which indeed define for an IRQ which EQ to 
>>>>>>>> notify 
>>>>>>>> and what data to push on the queue. 
>>>>>>>>  
>>>>>>>>>> The Event Queue is a much
>>>>>>>>>> more complex structure but we start with a simple model for the sPAPR
>>>>>>>>>> machine.
>>>>>>>>>>
>>>>>>>>>> There is one XiveEQ per priority and these are stored under the XIVE
>>>>>>>>>> virtualization presenter (sPAPRXiveNVT). EQs are simply indexed with 
>>>>>>>>>> :
>>>>>>>>>>
>>>>>>>>>>        (server << 3) | (priority & 0x7)
>>>>>>>>>>
>>>>>>>>>> This is not in the XIVE architecture but as the EQ index is never
>>>>>>>>>> exposed to the guest, in the hcalls nor in the device tree, we are
>>>>>>>>>> free to use what fits best the current model.
>>>>>>>>
>>>>>>>> This EQ indexing is important to notice because it will also show up 
>>>>>>>> in KVM to build the IVE from the KVM irq state.
>>>>>>>
>>>>>>> Ok, are you saying that while this combined EQ index will never appear
>>>>>>> in guest <-> host interfaces, 
>>>>>>
>>>>>> Indeed.
>>>>>>
>>>>>>> it might show up in qemu <-> KVM interfaces?
>>>>>>
>>>>>> Not directly but it is part of the IVE as the IVE_EQ_INDEX field. When
>>>>>> dumped, it has to be built in some ways, compatible with the emulated 
>>>>>> mode in QEMU. 
>>>>>
>>>>> Hrm.  But is the exact IVE contents visible to qemu (for a PAPR
>>>>> guest)?  
>>>>
>>>> The guest only uses hcalls which arguments are :
>>>>  
>>>>    - cpu numbers,
>>>>    - priority numbers from defined ranges, 
>>>>    - logical interrupt numbers.  
>>>>    - physical address of the EQ 
>>>>
>>>> The visible parts for the guest of the IVE are the 'priority', the 'cpu', 
>>>> and the 'eisn', which is the effective IRQ number the guest is assigning 
>>>> to the source. The 'eisn" will be pushed in the EQ.
>>>
>>> Ok.
>>>
>>>> The IVE EQ index is not visible.
>>>
>>> Good.
>>>
>>>>> I would have thought the qemu <-> KVM interfaces would have
>>>>> abstracted this the same way the guest <-> KVM interfaces do.  > Or is 
>>>>> there a reason not to?
>>>>
>>>> It is practical to dump 64bit IVEs directly from KVM into the QEMU 
>>>> internal structures because it fits the emulated mode without doing 
>>>> any translation ... This might be seen as a shortcut. You will tell 
>>>> me when you reach the KVM part.   
>>>
>>> Ugh.. exposing to qemu the raw IVEs sounds like a bad idea to me.
>>
>> You definitely need to in QEMU in emulation mode. The whole routing 
>> relies on it. 
> 
> I'm not exactly sure what you mean by "emulation mode" here.  Above,
> I'm talking specifically about a KVM HV, PAPR guest.

ah ok. I understand. 

KVM does not manipulate raw IVEs. Only OPAL manipulates the raw 
XIVE structures. But as the emulation mode under QEMU needs to 
also manipulate these structures, it seemed practical to use raw 
XIVE structures to transfer the state from KVM to QEMU. 

But, It might not be such a great idea. I suppose we should define 
a QEMU/KVM format for the exchanges with KVM and then, inside QEMU, 
have a translation QEMU/KVM to XIVE. The XIVE format being the
format used for migration.


>>> When we migrate, we're going to have to assign the guest (server,
>>> priority) tuples to host EQ indicies, and I think it makes more sense
>>> to do that in KVM and hide the raw indices from qemu than to have qemu
>>> mangle them explicitly on migration.
>>
>> We will need some mangling mechanism for the KVM ioctls saving and
>> restoring state. This is very similar to XICS. 
>>  
>>>>>>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
>>>>>>>>>
>>>>>>>>> Is the EQD actually modifiable by a guest?  Or are the settings of the
>>>>>>>>> EQs fixed by PAPR?
>>>>>>>>
>>>>>>>> The guest uses the H_INT_SET_QUEUE_CONFIG hcall to define the address
>>>>>>>> of the event queue for a couple prio/server.
>>>>>>>
>>>>>>> Ok, so the EQD can be modified by the guest.  In which case we need to
>>>>>>> work out what object owns it, since it'll need to migrate it.
>>>>>>
>>>>>> Indeed. The EQD are CPU related as there is one EQD per couple (cpu, 
>>>>>> priority). The KVM patchset dumps/restores the eight XiveEQ struct 
>>>>>> using per cpu ioctls. The EQ in the OS RAM is marked dirty at that
>>>>>> stage.
>>>>>
>>>>> To make sure I'm clear: for PAPR there's a strict relationship between
>>>>> EQD and CPU (one EQD for each (cpu, priority) tuple).  
>>>>
>>>> Yes.
>>>>
>>>>> But for powernv that's not the case, right?  
>>>>
>>>> It is.
>>>
>>> Uh.. I don't think either of us phrased that well, I'm still not sure
>>> which way you're answering that.
>>
>> there's a strict relationship between EQD and CPU (one EQD for each (cpu, 
>> priority) tuple) in spapr and in powernv.
> 
> For powernv that seems to be contradicted by what you say below.

ok. I see what you mean. There is a difference for the hypervisor when 
guests are running. As QEMU PowerNV does not support guests (yet), 
when can start the model with a strict relationship between EQD and 
CPU.

But it's not the case when guest are running, because the EQD refers 
to a NVT/VP which can be a virtual processor or a group of such. 

The current model is taking a shortcut, the CPU list should be scanned
to find matching CAM lines (W2 in the TIMA). I need to take a closer
look for powernv even if it is not strictly needed for the model 
without guest. 

> AFAICT there might be a strict association at the host kernel or even
> the OPAL level, but not at the hardware level.
> 
>>>>> AIUI the mapping of EQs to cpus was configurable, is that right?
>>>>
>>>> Each cpu has 8 EQD. Same for virtual cpus.
>>>
>>> Hmm.. but is that 8 EQD per cpu something built into the hardware, or
>>> just a convention of how the host kernel and OPAL operate?
>>
>> It's not in the HW, it is used by the HW to route the notification. 
>> The EQD contains the EQ characteristics :
>>
>> * functional bits :
>>   - valid bit
>>   - enqueue bit, to update OS in RAM EQ or not
>>   - unconditional notification
>>   - backlog
>>   - escalation
>>   - ...
>> * OS EQ fields 
>>   - physical address
>>   - entry index
>>   - toggle bit
>> * NVT fields
>>   - block/chip
>>   - index
>> * etc.
>>
>> It's a big structure : 8 words.
> 
> Ok.  So yeah, the cpu association of the EQ is there in the NVT
> fields, not baked into the hardware.

yes.

C. 

>> The EQD table is allocated by OPAL/skiboot and fed to the HW for
>> its use. The OS powernv uses OPAL calls  configure the EQD with its 
>> needs : 
>>
>> int64_t opal_xive_set_queue_info(uint64_t vp, uint32_t prio,
>>                               uint64_t qpage,
>>                               uint64_t qsize,
>>                               uint64_t qflags);
>>
>>
>> sPAPR uses an hcall :
>>
>> static long plpar_int_set_queue_config(unsigned long flags,
>>                                     unsigned long target,
>>                                     unsigned long priority,
>>                                     unsigned long qpage,
>>                                     unsigned long qsize)
>>
>>
>> but it is translated in an OPAL call in KVM.
>>
>> C.
>>
>>  
>>>  
>>>>
>>>> I am not sure what you understood before ? It is surely something
>>>> I wrote, my XIVE understanding is still making progress.
>>>>
>>>>
>>>> C.
>>>>
>>>
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]