[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH] spapr: Don't use the "dual" interrupt controller
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [PATCH] spapr: Don't use the "dual" interrupt controller mode with an old hypervisor |
Date: |
Wed, 12 Jun 2019 11:29:36 +1000 |
User-agent: |
Mutt/1.11.4 (2019-03-13) |
On Fri, Jun 07, 2019 at 11:49:50AM +0200, Greg Kurz wrote:
65;5603;1c> On Fri, 7 Jun 2019 10:17:58 +0200
> Cédric Le Goater <address@hidden> wrote:
>
> > On 07/06/2019 02:19, David Gibson wrote:
> > > On Thu, Jun 06, 2019 at 07:08:59PM +0200, Greg Kurz wrote:
> > >> If KVM is too old to support XIVE native exploitation mode, we might end
> > >> up using the emulated XIVE after CAS. This is sub-optimal if KVM
> > >> in-kernel
> > >> XICS is available, which is the case most of the time.
> > >
> > > This is intentional. A predictable guest environment trumps performance.
> > >
> >
> > I don't agree.
> >
> > If the user does not specify any specific interrupt mode, we should favor
> > the faster one.
> >
>
> This all boils down to the semantics of "dual"... "XICS AND XIVE" or "any
> of XICS or XIVE" ?
It means xics and xive are available, guest gets to choose.
> We already have a precedent when using pre-power9 compat mode. If the
> user passes ic-mode=dual,max-compat-cpu=power8, we internally convert
> "dual" to act as "xics". The rationale is that a guest running in
> power8 architected mode isn't supposed to know about XIVE at all.
>
> Should we forbid this config and exit QEMU instead ?
Uh.. yes, that would make sense, actually. However
max-compat-cpu=power8 without an explicit ic-mode=xive or ic-mode=dual
should act as ic-mode=xics. The difference here is that it's an
explicitly user specified constraint, rather than a host implied
constraint.
> > Here is the current matrix (with this patch) for guests running on an
> > old KVM, that is without KVM XIVE support. Let's discuss on what we
> > want.
> >
> > kernel_irqchip
> >
> > (default)
> > ic-mode allowed off on
> >
> > dual XICS KVM XICS emul.(3) XICS KVM (default mode)
> > xics XICS KVM XICS emul. XICS KVM
> > xive XIVE emul.(1) XIVE emul. QEMU failure (2)
> >
> >
> > (1) QEMU warns with "warning: kernel_irqchip requested but unavailable:
> > IRQ_XIVE capability must be present for KVM"
> > (2) QEMU fails with "kernel_irqchip requested but unavailable:
> > IRQ_XIVE capability must be present for KVM"
> > (3) That is wrong I think, we should get XIVE emulated.
> >
>
> This is the logical consequence of turning "dual" into "xics".
>
> >
> > what you would want is XIVE emulation when ic-mode=dual and
> > kernel_irqchip=allowed, which is the behavior with this patch (but there
>
> I guess you mean s/with/without
>
> > are reboot bugs)
> >
>
> Yeah. If the semantics of "dual" is to always advertise XICS AND XIVE, and we
> keep the current fallback behavior, then we need each implementation to have
> proper init/teardown support as well as proper rollback on error paths.
>
> This is definitely not the case: rollback is missing in both in-kernel
> XICS and XIVE code and the emulated XICS and XIVE don't have teardown.
>
> This can generate a variety of bugs, including QEMU crashes... the old KVM
> case
> is just one of them, but there might be others.
>
> >
> > >> Also, an old KVM may not allow to destroy and re-create the KVM XICS,
> > >> which
> > >> is precisely what "dual" does during machine reset. This causes QEMU to
> > >> try
> > >> to switch to emulated XICS and to crash because RTAS call de-registration
> > >> isn't handled correctly. We could possibly fix that, but again we would
> > >> still end up with an emulated XICS or XIVE.
> > >
> > > Ugh, that's a problem.
> >
> > Yes. It's another problem around the way we cleanup the allocated resources.
> > It should be another patch.
> >
>
> Yeah, probably many other patches...
>
> > >
> > >> "dual" is definitely not a good choice with older KVMs. Internally force
> > >> XICS when we detect this.
> > >
> > > But this is not an acceptable solution. Silently changing the guest
> > > visible environment based on host capabilities is never ok.
> >
> > If the host (KVM) doesn't have a capability, what is the point of trying
> > to use it if we can do better. I know you are considering KVM/QEMU as a
> > whole but who would run with kernel_irqchip=off ?
> >
>
> The real problem is when you don't pass kernel_irqchip at all. Combined
> with "dual", this gives a fair number of cases. Do we want to support
> all possible transitions ?
>
> > > We must
> > > either give the guest environment that the user has requested, or fail
> > > outright.
> >
> > 'dual' mode means both and the user is not requesting XIVE. We are changing
> > the priority of choices from :
> >
> > 1. KVM XIVE
> > 2. QEMU XIVE
> > 3. KVM XICS
> > 4. QEMU XICS
> >
> > to:
> >
> > 1. KVM XIVE
> > 2. KVM XICS
> > 3. QEMU XIVE
> > 4. QEMU XICS
> >
> > which is better I think.
> >
>
> Using KVM XICS is indeed better than QEMU XIVE... but IIUC what David
> is saying, kernel-irqchip semantics are:
>
> - on: user favors performance, at the expense of a reduced spectrum
> of usable hosts
>
> - absent: user favors being able to start the VM on a wider spectrum
> of hosts, at the possible expense of performance
Correct.
> - off: user wants the VM to start on any host, doesn't care for
> performance at all
Well, TBH, the main use of kernel-irqchip=off is for debugging and testing.
>
> So the only way to have 1. KVM XIVE 2.KVM XICS would be to pass "on".
>
> > C.
> >
> >
> > >
> > >>
> > >> Signed-off-by: Greg Kurz <address@hidden>
> > >> ---
> > >> hw/ppc/spapr_irq.c | 10 ++++++++++
> > >> 1 file changed, 10 insertions(+)
> > >>
> > >> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> > >> index 3156daf09381..d788bd662a7a 100644
> > >> --- a/hw/ppc/spapr_irq.c
> > >> +++ b/hw/ppc/spapr_irq.c
> > >> @@ -18,6 +18,7 @@
> > >> #include "hw/ppc/xics_spapr.h"
> > >> #include "cpu-models.h"
> > >> #include "sysemu/kvm.h"
> > >> +#include "kvm_ppc.h"
> > >>
> > >> #include "trace.h"
> > >>
> > >> @@ -668,6 +669,15 @@ static void spapr_irq_check(SpaprMachineState
> > >> *spapr, Error **errp)
> > >> return;
> > >> }
> > >> }
> > >> +
> > >> + /*
> > >> + * KVM may be too old to support XIVE, in which case we'd rather try
> > >> + * to use the in-kernel XICS instead of the emulated XIVE.
> > >> + */
> > >> + if (kvm_enabled() && !kvmppc_has_cap_xive() &&
> > >> + spapr->irq == &spapr_irq_dual) {
> > >> + spapr->irq = &spapr_irq_xics;
> > >> + }
> > >> }
> > >>
> > >> /*
> > >>
> > >
> >
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature