qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/5] KVM: Hook kvm_arch_put_registers() errors to the caller


From: Peter Xu
Subject: Re: [PATCH 5/5] KVM: Hook kvm_arch_put_registers() errors to the caller
Date: Thu, 23 Jun 2022 12:55:37 -0400

On Thu, Jun 23, 2022 at 02:09:43PM +0100, Peter Maydell wrote:
> On Fri, 17 Jun 2022 at 15:53, Peter Xu <peterx@redhat.com> wrote:
> >
> > Leverage the new mechanism to pass over errors to upper stack for
> > kvm_arch_put_registers() when called for the post_init() accel hook.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  accel/kvm/kvm-all.c  | 13 ++++++++++---
> >  accel/kvm/kvm-cpus.h |  2 +-
> >  softmmu/cpus.c       |  5 ++++-
> >  3 files changed, 15 insertions(+), 5 deletions(-)
> 
> Checking for errors definitely does seem like the right thing to do.
> That said:
> 
> (1) Why do we want to check for errors only on the call
> for post_init synchronize, and not any of the other places
> where we call kvm_arch_put_registers()?

Because I only know that's what we need to keep live migration honest on
being successful, and I didn't want to spread the fire elsewhere, neither
from knowledge nor time..  So I wanted to keep the series simple but
useful.

If we have reasons to cover some of the rest, I can still try.

> 
> (2) I suspect this will uncover some situations where we've
> been happening-to-work because we ignore an error, and now
> will start to actively fail. But I guess there's not much
> we can do about that except say "we'll fix them as we encounter
> bug reports about them". (I know of at least one: on the
> Mac M1 running Linux, if the host doesn't have this kernel fix:
> https://lore.kernel.org/all/YnHz6Cw5ONR2e+KA@google.com/T/
> then the first put_registers will fail (mostly harmlessly).
> I think that's the post_init sync but it might be the post_reset
> one.)

.. from what I read from the commit message in the link, hopefully that was
only about the reset process since that sounds like a mismatched regs
before/after gic created.  When migration completes, I guess we're always
fetching from the after-gic-created case?  But it'll be great if we double
check.  Luckily it seems only for m1.

What my series wanted to achieve is not affect anything else but migration
(so if they fail elsewhere they keep the benign failing).  What I worried
is exactly when we have benign failures on put regs on live migration use
cases, but hopefully not.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]