qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 1/1] qemu-char: socket backend: disconnect on


From: Marc-André Lureau
Subject: Re: [Qemu-devel] [PATCH v2 1/1] qemu-char: socket backend: disconnect on write error
Date: Fri, 24 Feb 2017 12:59:03 +0000

Hi

On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <address@hidden> wrote:

> From: Anton Nefedov <address@hidden>
>
> Socket backend read handler should normally perform a disconnect, however
> the read handler may not get a chance to run if the frontend is not ready
> (qemu_chr_be_can_write() == 0).
>
> This means that in virtio-serial frontend case if
>  - the host has disconnected (giving EPIPE on socket write)
>  - and the guest has disconnected (-> frontend not ready -> backend
>    will not read)
>  - and there is still data (frontend->backend) to flush (has to be a really
>    tricky timing but nevertheless, we have observed the case in production)
>
> This results in virtio-serial trying to flush this data continiously
> forming
> a busy loop.
>
> Solution: react on write error in the socket write handler.
> errno is not reliable after qio_channel_writev_full(), so we may not get
> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which
> io_channel_send_full() converts to errno EAGAIN.
> We must not disconnect right away though, there still may be data to read
> (see 4bf1cb0).
>
> Signed-off-by: Anton Nefedov <address@hidden>
> Signed-off-by: Denis V. Lunev <address@hidden>
> CC: Paolo Bonzini <address@hidden>
> CC: Daniel P. Berrange <address@hidden>
> CC: Marc-André Lureau <address@hidden>
> ---
> Changes from v1:
> - we do not rely on EPIPE anynore. Socket should be closed on all errors
>   except EAGAIN
>
>  chardev/char-socket.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 4068dc5..e2137aa 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan,
>                                 GIOCondition cond,
>                                 void *opaque);
>
> +static int tcp_chr_read_poll(void *opaque);
> +static void tcp_chr_disconnect(CharDriverState *chr);
> +
>  /* Called with chr_write_lock held.  */
>  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>  {
> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t
> *buf, int len)
>              s->write_msgfds_num = 0;
>          }
>
> +        if (ret < 0 && errno != EAGAIN) {
> +            if (tcp_chr_read_poll(chr) <= 0) {
> +                tcp_chr_disconnect(chr);
> +                return len;
>

This change breaks a number of assumption in vhost-user code. Until now, a
vhost-user function assumed that dev->vhost_ops would remain as long as the
function is running, so it may call vhost_ops callbacks several time, which
may eventually fail to do io, but no crash would occur. The disconnection
would be handled later with the HUP handler. Now, vhost_ops may be cleared
during a write (chr_disconnect -> CHR_EVENT_CLOSED in
net_vhost_user_event). This can be randomly reproduced with vhost-user-test
-p /x86_64/vhost-user/flags-mismatch/subprocess  (broken pipe, qemu crashes)

I am trying to fix this, but it isn't simple (races, many code paths etc),
so I just wanted to inform you about it.


+            } /* else let the read handler finish it properly */
> +        }
> +
>          return ret;
>      } else {
>          /* XXX: indicate an error ? */
> --
> 2.7.4
>
>
> --
Marc-André Lureau


reply via email to

[Prev in Thread] Current Thread [Next in Thread]