qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Questions about nbd with QIOChannel


From: Changlong Xie
Subject: [Qemu-devel] Questions about nbd with QIOChannel
Date: Thu, 7 Apr 2016 19:04:04 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Hi all

Recently during test COLO, i found sometimes the client goes to hung on Primary side. First i thought it maybe a COLO revelant issue, but after ton of tests i doubt that this maybe a NBD issue (athough i'm not sure). So i'd like to share what i found:

Since commit 1c778ef7, we convert to using QIOChannel APIs for actual socket I/O.

Let foucus on nbd_reply_ready() here:

Before commit 1c778ef7
nbd_reply_ready()
  nbd_receive_reply()
    nbd_wr_sync()
    {
     ...
     while (offset < size) {
         if (do_read) {
             len = qemu_recv(fd, buffer + offset, size - offset, 0);
         } else {
             ...
         }
         if (len < 0) {
             err = socket_error();
if (err == EINTR || (offset > 0 && (err == EAGAIN || err == EWOULDBLOCK))) {
                 continue;
             }
             return -err;
         }
         ...
     }
     ....
    }

if len < 0 && error == EAGAIN. we have two choice
1) continue to recv until finished.
2) return -EAGAIN, nbd_receive_reply() will check this return value and will return *Successfully*.
                        
After commit 1c778ef7:
nbd_reply_ready()
  read_sync()
    nbd_wr_syncv()
    {
     ...
     while (nlocal_iov > 0) {
         ...
         if (do_read) {
len = qio_channel_readv(ioc, local_iov, nlocal_iov, &local_err);
         } else {
             ...
         }
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qemu_coroutine_yield();
             } else {
                 qio_channel_wait(ioc,
                                  do_read ? G_IO_IN : G_IO_OUT);
             }
             continue;
         }
         ...
     }
    }

For NBD,
qio_channel_readv()
  qio_channel_readv_full
    klass->io_readv()
     qio_channel_socket_readv()
     {
        for(..) {
            ret = recv(xxx);
            if (ret < 0) {
                if (errno == EAGAIN) {
                    if (done) {
                        return done;
                    } else {
                        return QIO_CHANNEL_ERR_BLOCK;
                    }
                }

            }
            ...
        }
     }

Here, if ret < 0 && error == EAGAIN && !done, we'll return QIO_CHANNEL_ERR_BLOCK. Then nbd_wr_syncv() will invoke qio_channel_wait() and the guest will *HUNG* until i kill
nbd server service.

It's easy to reproduce. My question: If the scenario i describe above is what we expected?

Thanks
    -Xie





reply via email to

[Prev in Thread] Current Thread [Next in Thread]