[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v8 17/20] multi-process: heartbeat messages to remote
From: |
Stefan Hajnoczi |
Subject: |
Re: [PATCH v8 17/20] multi-process: heartbeat messages to remote |
Date: |
Wed, 19 Aug 2020 09:00:01 +0100 |
On Fri, Aug 14, 2020 at 04:01:47PM -0700, Elena Ufimtseva wrote:
> On Tue, Aug 11, 2020 at 03:41:30PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Jul 31, 2020 at 02:20:24PM -0400, Jagannathan Raman wrote:
> > > @@ -343,3 +349,49 @@ static void probe_pci_info(PCIDevice *dev, Error
> > > **errp)
> > > }
> > > }
> > > }
> > > +
> > > +static void hb_msg(PCIProxyDev *dev)
> > > +{
> > > + DeviceState *ds = DEVICE(dev);
> > > + Error *local_err = NULL;
> > > + MPQemuMsg msg = { 0 };
> > > +
> > > + msg.cmd = PROXY_PING;
> > > + msg.bytestream = 0;
> > > + msg.size = 0;
> > > +
> > > + (void)mpqemu_msg_send_and_await_reply(&msg, dev->ioc, &local_err);
> > > + if (local_err) {
> > > + error_report_err(local_err);
> > > + qio_channel_close(dev->ioc, &local_err);
> > > + error_setg(&error_fatal, "Lost contact with device %s", ds->id);
> > > + }
> > > +}
> >
> > Here is my feedback from the last revision. Was this addressed?
> >
>
> Hi Stefan,
>
> Thank you for reviewing the patchset. In this version we decided to
> shutdown the guest when the heartbeat did not get a reply from the
> remote by setting the error_fatal.
> Should we approach it differently or you prefer us to get rid of the
> heartbeat in this form?
I think the only case that this patch handles is when the mpqemu channel
is closed.
The VM hangs when the channel is still open but the remote is
unresponsive. (mpqemu_msg_send_and_await_reply() calls aio_poll() with
the global mutex held so vcpus cannot make progress.)
The heartbeat mechanism needs to handle the case where the other side
isn't responding. It can't hang QEMU.
I suggest dropping this patch. It can be done later.
Stefan
signature.asc
Description: PGP signature