qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection


From: Yuanhan Liu
Subject: Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
Date: Tue, 29 Mar 2016 22:28:30 +0800
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, Mar 29, 2016 at 12:52:32PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Mar 29, 2016 at 10:10 AM, Yuanhan Liu
> <address@hidden> wrote:
> > Backend crash may be not that normal in production usage, it is in
> > development stage. It would be handy if we can handle that as well,
> > as it's a painful experience for me that every time I do a VM2VM test
> > and once my changes to backend causes a crash (unfortunately, it
> > happens often in early stage), I have to reboot the two VMs.
> >
> > If we have reconnect, life could be easier. I then don't need worry
> > that I will mess the backend and crash it any more.
> >
> > And again, the reason I mentioned crash here again is not because
> > we need handle it, specially. Instead, I was thinking we might be
> > able to handle the two reasons both.
> 
> I think crash could be handle with queue reset Michael proposed, but
> that would probably require guest side changes.

Is the reply from Michael in this thread the whole story? If not, would
you please give me a link of that discussion?

> 
> >> Conceptually, I think if we allow the backend to disconnect, it makes
> >> sense that qemu is actually the socket server. But it doesn't matter
> >> much, it's simple to teach qemu to reconnect a timer...
> >
> > We already have that, right? I mean, the char-dev "reconnect" option.
> 
> Sure, what I mean is that it sounds cleaner from a design pov for qemu
> to be the server (since it is the one actually waiting for backend in
> this case),

Yeah, I agree with you on that. However, thinking in this way: let QEMU
be the server and backend be the client, the client still need to keep
trying to connect the server, when the server crashes/quits/restarts.

My point is, in either way, we need bear in mind that server could also
be down (due to crashes/restarts), that we have to teach the client to
do reconnect when disconnected. Judging that QEMU already has the support,
I'd slightly prefer to let QEMU still be the client, and do the reconnect
tries, if it works.

And to make it clear, both should work, but it's more like a question
which one will be a better option (to us, DPDK), QEMU be the server or
the client?  What's sure here is that, in either way, your patches would
work and are required.

> beside a timer is often a pretty rough solution to
> anything.

As stated above, I'm afraid that is somehow needed. It might be in QEMU
or backend, though.

> >> So we should
> >> probably allow both cases anyway.
> >
> > Yes, I think both should work. I may be wrong (that I may miss
> > something), but it seems it's (far) easier to me to keep QEMU
> > as the client, and adding the "reconnect" option, judging that
> > we have all stuff to make it work ready. In this way,
> >
> > - if backend crashes/quits, you just need restart the backend,
> >   and QEMU will retry the connection.
> >
> > - if QEMU crashes/quits, you just need restart the QEMU, and
> >   then QEMU will start a fresh connection.
> 
> It may all depend on use cases, it's not more obvious or easy than the
> other to me.
> 
> > However, if let QEMU as the server, there are 2 major works need
> > to be done in the backend side (take DPDK as example, that just
> > acts as vhost-user server only so far):
> >
> > - Introducing a new API or extending current API to let it to
> >   connect the server socket from QEMU.
> >
> > - If QEMU crashes/quits, we need add code to backend to keep
> >   reconnecting unless connection is established, which is something
> >   similar to the "reconnect" stuff in QEMU.
> > As you can see, it needs more effort (though it's not something
> > you care :). And it has duplicate work.
> 
> 
> Ah, I am looking at this from qemu angle, backend may need to adapt if
> it doesn't already handle both "socket role" (client & server).

Agreed.

> >
> >>
> >> > - start DPDK vhost-switch example
> >> >
> >> > - start QEMU, which will connect to DPDK vhost-user
> >> >
> >> >   link is good now.
> >> >
> >> > - kill DPDK vhost-switch
> >> >
> >> >   link is broken at this stage
> >> >
> >> > - start DPDK vhost-switch again
> >> >
> >> >   you will find that the link is back again.
> >> >
> >> >
> >> > Will that makes sense to you? If so, we may need do nothing (or just
> >> > very few) changes at all to DPDK to get the reconnect work.
> >>
> >> The main issue with handling crashes (gone at any time) is that the
> >> backend my not have time to sync the used idx (at the least). It may
> >> already have processed incoming packets, so on reconnect, it may
> >> duplicate the receiving/dispatching work.
> >
> > That's not the case for DPDK backend implementation: incoming packets
> > won't be delivered for processing before we update the used idx.
> 
> It could be ok on incoming packet side (vm->dpdk), but I don't see how
> you could avoid packet loss on the other side (dpdk->vm), since the
> packets must be added to queues before updating used idx.

You could check the return value, how many packets have been successfully
delivered. You then could retry in case of errors. However, I am not quite
sure such disconnect error can be detected though.

Thanks.

        --yliu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]