qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] spapr: Don't use the "dual" interrupt controlle


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH] spapr: Don't use the "dual" interrupt controller mode with an old hypervisor
Date: Wed, 12 Jun 2019 11:29:36 +1000
User-agent: Mutt/1.11.4 (2019-03-13)

On Fri, Jun 07, 2019 at 11:49:50AM +0200, Greg Kurz wrote:
65;5603;1c> On Fri, 7 Jun 2019 10:17:58 +0200
> Cédric Le Goater <address@hidden> wrote:
> 
> > On 07/06/2019 02:19, David Gibson wrote:
> > > On Thu, Jun 06, 2019 at 07:08:59PM +0200, Greg Kurz wrote:  
> > >> If KVM is too old to support XIVE native exploitation mode, we might end
> > >> up using the emulated XIVE after CAS. This is sub-optimal if KVM 
> > >> in-kernel
> > >> XICS is available, which is the case most of the time.  
> > > 
> > > This is intentional.  A predictable guest environment trumps performance. 
> > >  
> > 
> > I don't agree. 
> > 
> > If the user does not specify any specific interrupt mode, we should favor 
> > the faster one. 
> > 
> 
> This all boils down to the semantics of "dual"... "XICS AND XIVE" or "any
> of XICS or XIVE" ?

It means xics and xive are available, guest gets to choose.

> We already have a precedent when using pre-power9 compat mode. If the
> user passes ic-mode=dual,max-compat-cpu=power8, we internally convert
> "dual" to act as "xics". The rationale is that a guest running in
> power8 architected mode isn't supposed to know about XIVE at all.
> 
> Should we forbid this config and exit QEMU instead ?

Uh.. yes, that would make sense, actually.  However
max-compat-cpu=power8 without an explicit ic-mode=xive or ic-mode=dual
should act as ic-mode=xics.  The difference here is that it's an
explicitly user specified constraint, rather than a host implied
constraint.

> > Here is the current matrix (with this patch) for guests running on an 
> > old KVM, that is without KVM XIVE support. Let's discuss on what we
> > want. 
> > 
> >                         kernel_irqchip
> > 
> >            (default) 
> > ic-mode     allowed           off            on 
> > 
> > dual        XICS KVM       XICS emul.(3)   XICS KVM         (default mode)
> > xics        XICS KVM       XICS emul.      XICS KVM    
> > xive        XIVE emul.(1)  XIVE emul.     QEMU failure (2)
> > 
> > 
> > (1) QEMU warns with "warning: kernel_irqchip requested but unavailable: 
> >     IRQ_XIVE capability must be present for KVM" 
> > (2) QEMU fails with "kernel_irqchip requested but unavailable: 
> >     IRQ_XIVE capability must be present for KVM" 
> > (3) That is wrong I think, we should get XIVE emulated.
> > 
> 
> This is the logical consequence of turning "dual" into "xics".
> 
> > 
> > what you would want is XIVE emulation when ic-mode=dual and 
> > kernel_irqchip=allowed, which is the behavior with this patch (but there
> 
> I guess you mean s/with/without 
> 
> > are reboot bugs)
> > 
> 
> Yeah. If the semantics of "dual" is to always advertise XICS AND XIVE, and we
> keep the current fallback behavior, then we need each implementation to have
> proper init/teardown support as well as proper rollback on error paths.
> 
> This is definitely not the case: rollback is missing in both in-kernel 
> XICS and XIVE code and the emulated XICS and XIVE don't have teardown.
> 
> This can generate a variety of bugs, including QEMU crashes... the old KVM 
> case
> is just one of them, but there might be others.
> 
> >  
> > >> Also, an old KVM may not allow to destroy and re-create the KVM XICS, 
> > >> which
> > >> is precisely what "dual" does during machine reset. This causes QEMU to 
> > >> try
> > >> to switch to emulated XICS and to crash because RTAS call de-registration
> > >> isn't handled correctly. We could possibly fix that, but again we would
> > >> still end up with an emulated XICS or XIVE.  
> > > 
> > > Ugh, that's a problem.  
> > 
> > Yes. It's another problem around the way we cleanup the allocated resources.
> > It should be another patch.
> > 
> 
> Yeah, probably many other patches...
> 
> > >   
> > >> "dual" is definitely not a good choice with older KVMs. Internally force
> > >> XICS when we detect this.  
> > > 
> > > But this is not an acceptable solution.  Silently changing the guest
> > > visible environment based on host capabilities is never ok.   
> > 
> > If the host (KVM) doesn't have a capability, what is the point of trying 
> > to use it if we can do better. I know you are considering KVM/QEMU as a
> > whole but who would run with kernel_irqchip=off ?
> > 
> 
> The real problem is when you don't pass kernel_irqchip at all. Combined
> with "dual", this gives a fair number of cases. Do we want to support
> all possible transitions ?
> 
> > > We must
> > > either give the guest environment that the user has requested, or fail
> > > outright.  
> > 
> > 'dual' mode means both and the user is not requesting XIVE. We are changing 
> > the priority of choices from :
> > 
> >  1. KVM XIVE
> >  2. QEMU XIVE
> >  3. KVM XICS
> >  4. QEMU XICS
> > 
> > to:
> > 
> >  1. KVM XIVE
> >  2. KVM XICS
> >  3. QEMU XIVE
> >  4. QEMU XICS
> > 
> > which is better I think.
> > 
> 
> Using KVM XICS is indeed better than QEMU XIVE... but IIUC what David
> is saying, kernel-irqchip semantics are:
> 
> - on: user favors performance, at the expense of a reduced spectrum
>       of usable hosts
> 
> - absent: user favors being able to start the VM on a wider spectrum
>           of hosts, at the possible expense of performance

Correct.

> - off: user wants the VM to start on any host, doesn't care for
>        performance at all

Well, TBH, the main use of kernel-irqchip=off is for debugging and testing.

> 
> So the only way to have 1. KVM XIVE 2.KVM XICS would be to pass "on".
> 
> > C.
> > 
> > 
> > >   
> > >>
> > >> Signed-off-by: Greg Kurz <address@hidden>
> > >> ---
> > >>  hw/ppc/spapr_irq.c |   10 ++++++++++
> > >>  1 file changed, 10 insertions(+)
> > >>
> > >> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> > >> index 3156daf09381..d788bd662a7a 100644
> > >> --- a/hw/ppc/spapr_irq.c
> > >> +++ b/hw/ppc/spapr_irq.c
> > >> @@ -18,6 +18,7 @@
> > >>  #include "hw/ppc/xics_spapr.h"
> > >>  #include "cpu-models.h"
> > >>  #include "sysemu/kvm.h"
> > >> +#include "kvm_ppc.h"
> > >>  
> > >>  #include "trace.h"
> > >>  
> > >> @@ -668,6 +669,15 @@ static void spapr_irq_check(SpaprMachineState 
> > >> *spapr, Error **errp)
> > >>              return;
> > >>          }
> > >>      }
> > >> +
> > >> +    /*
> > >> +     * KVM may be too old to support XIVE, in which case we'd rather try
> > >> +     * to use the in-kernel XICS instead of the emulated XIVE.
> > >> +     */
> > >> +    if (kvm_enabled() && !kvmppc_has_cap_xive() &&
> > >> +        spapr->irq == &spapr_irq_dual) {
> > >> +        spapr->irq = &spapr_irq_xics;
> > >> +    }
> > >>  }
> > >>  
> > >>  /*
> > >>  
> > >   
> > 
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]