Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_rep

From:	Michael S. Tsirkin
Subject:	Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
Date:	Sat, 25 Jun 2016 01:38:01 +0300

On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote:
> On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <address@hidden> wrote:
> >> > If it's ok and we can recover, then why should we print errors?
> >>
> >> To me, the current disconnect handling is not handled cleanly. There
> >> is not much/nothing in the protocol that tells when and how you can
> >> disconnect. If qemu vhost-user reconnection works today, it is mostly
> >> by luck. Imho, we should print errors if any call to vhost user fails
> >> to help further analysis of broken behaviour.
> >
> > But we decided disconnect is OK, did we not? So now a failure like this
> > is just expected. It's not broken behaviour anymore ...
> 
> As you know, there are many ways qemu or the vm can break when the
> backend is disconnected.  For now, it would help a lot if we had a bit
> of error messages in this case. But do we really want to support
> spurious disconnect/reconnect at any time? It's going to be
> challenging, to maintain, to test... Is it really worthwhile? I doubt
> it. I think it would be easier and more future-proof to have a
> dedicated vhost-user request for that and only do a best effort for
> the ungraceful disconnect.

ungraceful might take a while. But I don't see a need for
message exchanges for shutdown. Do you?

> >> And we shoud have a
> >> clean mechanism to handle shutdown/disconnect/reconnect.
> >
> >
> > At some level this is something that's missing here.
> >
> > How about we patch docs/specs/vhost-user.txt
> > and describe how is reconnection supposed to work?
> >
> > All we have is:
> >         If Master is unable to send the full message or receives a wrong 
> > reply
> >         it will close the connection. An optional reconnection mechanism 
> > can be
> >         implemented.
> 
> This text is there from the original commit but it doesn't say how to
> reconnect and or how to recover from the ring. I don't have enough
> experience with virtio in general to say when it is acceptable for
> backend to be gone (not processing the ring), I can imagine the
> implementation vary widely, and the requirements depend on the kind of
> device.
> 
> If we limit the spec to vhost-user protocol only and leave it open on
> how each virtio device should support ungraceful disconnect/reconnect,
> then it sounds reasonable to document that after such disconnect, on
> reconnect:
> - the server assumes the backend has no knowledge of previous connection
> - the state can be changed between reconnections (new ring state, mem table 
> etc)
> - the server will check feature compatibility with previous backend
> and reject incompatible backends
> - backend must restore ring processing from used->idx and ignore
> VHOST_USER_SET_VRING_BASE (or should we change and document that in
> this case the ring base is updated from used->idx?)

I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff
from used->idx.

> (it's probably not a complete list, I am not good at imagining all
> possible cases)

used ring must be flushed before disconnect (so all entries
before used->idx have been processed, and no entries
after used->idx have been processed).

If enabled, dirty log must be flushed before disconnect.


> 
> -- 
> Marc-André Lureau

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error, (continued)
- [Qemu-devel] [PATCH 09/24] vhost: make vhost_log_put() idempotent, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, marcandre . lureau, 2016/06/21
  - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Michael S. Tsirkin, 2016/06/23
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Marc-André Lureau, 2016/06/23
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Michael S. Tsirkin, 2016/06/23
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Marc-André Lureau, 2016/06/24
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Michael S. Tsirkin <=
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Marc-André Lureau, 2016/06/27
- [Qemu-devel] [PATCH 10/24] vhost: call vhost_log_put() on cleanup, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 12/24] vhost: make vhost_dev_cleanup() idempotent, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 13/24] vhost-net: always call vhost_dev_cleanup() on failure, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 14/24] vhost: don't assume opaque is a fd, use backend cleanup, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 11/24] vhost: add vhost device only after all success, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value, marcandre . lureau, 2016/06/21
  - Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value, Michael S. Tsirkin, 2016/06/23
    - Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value, Marc-André Lureau, 2016/06/23
    - Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value, Michael S. Tsirkin, 2016/06/23

Prev by Date: Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
Next by Date: Re: [Qemu-devel] [PATCH 0/1] vhost-user: Add a protocol extension for client responses to vhost commands.
Previous by thread: Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
Next by thread: Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
Index(es):
- Date
- Thread