Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_rep

From:	Marc-André Lureau
Subject:	Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
Date:	Fri, 24 Jun 2016 15:25:30 +0200

On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <address@hidden> wrote:
>> > If it's ok and we can recover, then why should we print errors?
>>
>> To me, the current disconnect handling is not handled cleanly. There
>> is not much/nothing in the protocol that tells when and how you can
>> disconnect. If qemu vhost-user reconnection works today, it is mostly
>> by luck. Imho, we should print errors if any call to vhost user fails
>> to help further analysis of broken behaviour.
>
> But we decided disconnect is OK, did we not? So now a failure like this
> is just expected. It's not broken behaviour anymore ...

As you know, there are many ways qemu or the vm can break when the
backend is disconnected.  For now, it would help a lot if we had a bit
of error messages in this case. But do we really want to support
spurious disconnect/reconnect at any time? It's going to be
challenging, to maintain, to test... Is it really worthwhile? I doubt
it. I think it would be easier and more future-proof to have a
dedicated vhost-user request for that and only do a best effort for
the ungraceful disconnect.

>> And we shoud have a
>> clean mechanism to handle shutdown/disconnect/reconnect.
>
>
> At some level this is something that's missing here.
>
> How about we patch docs/specs/vhost-user.txt
> and describe how is reconnection supposed to work?
>
> All we have is:
>         If Master is unable to send the full message or receives a wrong reply
>         it will close the connection. An optional reconnection mechanism can 
> be
>         implemented.

This text is there from the original commit but it doesn't say how to
reconnect and or how to recover from the ring. I don't have enough
experience with virtio in general to say when it is acceptable for
backend to be gone (not processing the ring), I can imagine the
implementation vary widely, and the requirements depend on the kind of
device.

If we limit the spec to vhost-user protocol only and leave it open on
how each virtio device should support ungraceful disconnect/reconnect,
then it sounds reasonable to document that after such disconnect, on
reconnect:
- the server assumes the backend has no knowledge of previous connection
- the state can be changed between reconnections (new ring state, mem table etc)
- the server will check feature compatibility with previous backend
and reject incompatible backends
- backend must restore ring processing from used->idx and ignore
VHOST_USER_SET_VRING_BASE (or should we change and document that in
this case the ring base is updated from used->idx?)

(it's probably not a complete list, I am not good at imagining all
possible cases)

-- 
Marc-André Lureau

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 08/24] vhost-user: return a read error, (continued)
- [Qemu-devel] [PATCH 08/24] vhost-user: return a read error, marcandre . lureau, 2016/06/21
  - Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error, Michael S. Tsirkin, 2016/06/23
    - Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error, Marc-André Lureau, 2016/06/23
    - Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error, Michael S. Tsirkin, 2016/06/23
    - Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error, Marc-André Lureau, 2016/06/24
- [Qemu-devel] [PATCH 09/24] vhost: make vhost_log_put() idempotent, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, marcandre . lureau, 2016/06/21
  - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Michael S. Tsirkin, 2016/06/23
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Marc-André Lureau, 2016/06/23
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Michael S. Tsirkin, 2016/06/23
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Marc-André Lureau <=
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Michael S. Tsirkin, 2016/06/24
    - Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail, Marc-André Lureau, 2016/06/27
- [Qemu-devel] [PATCH 10/24] vhost: call vhost_log_put() on cleanup, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 12/24] vhost: make vhost_dev_cleanup() idempotent, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 13/24] vhost-net: always call vhost_dev_cleanup() on failure, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 14/24] vhost: don't assume opaque is a fd, use backend cleanup, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 11/24] vhost: add vhost device only after all success, marcandre . lureau, 2016/06/21
- [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value, marcandre . lureau, 2016/06/21
  - Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value, Michael S. Tsirkin, 2016/06/23
    - Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value, Marc-André Lureau, 2016/06/23

Prev by Date: Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
Next by Date: [Qemu-devel] [PATCH 05/17] s390x/pci: refactor s390_pci_find_dev_by_fh
Previous by thread: Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
Next by thread: Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
Index(es):
- Date
- Thread