qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support


From: Yuanhan Liu
Subject: Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
Date: Fri, 29 Apr 2016 10:48:35 -0700
User-agent: Mutt/1.5.23 (2014-03-12)

On Fri, Apr 29, 2016 at 12:40:09PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Apr 28, 2016 at 7:23 AM, Yuanhan Liu
> <address@hidden> wrote:
> > On Fri, Apr 01, 2016 at 01:16:21PM +0200, address@hidden wrote:
> >> From: Marc-André Lureau <address@hidden>
> >> +Slave message types
> >> +-------------------
> >> +
> >> + * VHOST_USER_SLAVE_SHUTDOWN:
> >> +      Id: 1
> >> +      Master payload: N/A
> >> +      Slave payload: u64
> >> +
> >> +      Request the master to shutdown the slave. A 0 reply is for
> >> +      success, in which case the slave may close all connections
> >> +      immediately and quit. A non-zero reply cancels the request.
> >> +
> >> +      Before a reply comes, the master may make other requests in
> >> +      order to flush or sync state.
> >
> > Hi all,
> >
> > I threw this proposal as well as DPDK's implementation to our customer
> > (OVS, Openstack and some other teams) who made such request before. I'm
> > sorry to say that none of them really liked that we can't handle crash.
> > Making reconnect work from a vhost-user backend crash is exactly something
> > they are after.
> 
> Handling crashes is not quite the same as what I propose here.

Yes, I know. However, handling crashes is exactly what our customers
want. And I just want to let you know that, say, I don't ask you to
do that :)

> I see
> it as a different goal. But I doubt about usefulness and reliability
> of a backend that crashes.

Agreed with you on that. However, I guess you have to admit that crashes
just happen. Kernel sometimes crashes, too. So, it would be great if
we could make whole stuff work again after an unexpected crash (say,
from OVS), without restarting all guests.

> In many case, vhost-user was designed after
> kernel vhost, and qemu code has the same expectation about the kernel
> or the vhost-user backend: many calls are sync and will simply
> assert() on unexpected results.

I guess we could at aleast try to dimish it, if we can't avoid it completely.

> > And to handle the crash, I was thinking of the proposal from Michael.
> > That is to do reset from the guest OS. This would fix this issue
> > ultimately. However, old kernel will not benefit from this, as well
> > as other guest other than Linux, making it not that useful for current
> > usage.
> 
> Yes, I hope Michael can help with that, I am not very familiar with
> the kernel code.
> 
> > Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance
> > to get the vring base (last used idx) from the backend, Huawei suggests
> 
> Right, but after this message, the backend should have flushed all
> pending ring packets and stop processing them. So it's also a clean
> sync point.
> 
> > that we could still make it in a consistent state after the crash, if
> > we get the vring base from vring->used->idx.  That worked as expected
> 
> You can have a backend that would have already processed packets and
> not updated used idx. You could also have out-of-order packets in
> flights (ex: desc 1-2-3 avail, 1-3 used, 2 pending..). I can't see a
> clean way to restore this, but to reset the queues and start over,
> with either packet loss or packet duplication.

Judging that it (crash or restart) happens so rare, I don't think
it matters. Moreoever, doesn't that happen in real world :)

> If the backend
> guarantees to process packets in order, it may simplify things, but it
> would be a special case.

Well, it's more like a backend thing: it's the backend to try to
set a saner vring base as stated in above proposal. Therefore, I
will not say it's a special case.

> 
> > from my test. The only tricky thing might be how to detect a crash,
> > and we could do a simple compare of the vring base from QEMU with
> > the vring->used->idx at the initiation stage. If mismatch found, get
> > it from vring->used->idx instead.
> 
> I don't follow, would the backend restore its last vring->used->idx
> after a crash?

Yes, restore from the SET_VRING_BASE from QEMU. But it's a stale value,
normally 0 if no start/stop happens before. Therefore, we can't use
it after the crash, instead, we could try to detect the mismatch and
try to fix it at SET_VRING_ADDR request.

> 
> > Comments/thoughts? Is it a solid enough solution to you?  If so, we
> > could make things much simpler, and what's most important, we could
> > be able to handle crash.
> 
> That's not so easy, many of the vhost_ops->vhost*() are followed by
> assert(r>0), which will be hard to change to handle failure. But, I
> would worry first about a backend that crashes that it may corrupt the
> VM memory too...

Not quite sure I follow this. But, backend just touches the virtio
related memory, so it will do no harm to the VM?

        --yliu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]