qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread


From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread
Date: Thu, 4 Dec 2014 16:03:02 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 04.12.2014 um 15:42 hat Peter Lieven geschrieben:
> On 04.12.2014 15:12, Kevin Wolf wrote:
> >Am 02.12.2014 um 15:33 hat Peter Lieven geschrieben:
> >>this patch finally introduce multiread support to virtio-blk while
> >>multiwrite support was there for a long time read support was missing.
> >>
> >>To achieve this the patch does serveral things which might need futher
> >>explaination:
> >>
> >>  - the whole merge and multireq logic is moved from block.c into
> >>    virtio-blk. This is move is a preparation for directly creating a
> >>    coroutine out of virtio-blk.
> >>
> >>  - requests are only merged if they are strictly sequential and no
> >>    longer sorted. This simplification decreases overhead and reduces
> >>    latency. It will also merge some requests which were unmergable before.
> >>
> >>    The old algorithm took up to 32 requests sorted them and tried to merge
> >>    them. The outcome was anything between 1 and 32 requests. In case of
> >>    32 requests there were 31 requests unnecessarily delayed.
> >>
> >>    On the other hand lets imagine e.g. 16 unmergeable requests followed
> >>    by 32 mergable requests. The latter 32 requests would have been split
> >>    into two 16 byte requests.
> >>
> >>    Last the simplified logic allows for a fast path if we have only a
> >>    single request in the multirequest. In this case the request is sent as
> >>    ordinary request without mulltireq callbacks.
> >>
> >>As a first benchmark I installed Ubuntu 14.04.1 on a ramdisk. The number of
> >>merged requests is in the same order while the latency is slightly 
> >>decreased.
> >>One should not stick to much to the numbers because the number of 
> >>wr_requests
> >>are highly fluctuant. I hope the numbers show that this patch is at least
> >>not causing to big harm:
> >>
> >>Cmdline:
> >>qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom 
> >>ubuntu-14.04.1-server-amd64.iso \
> >>  -drive if=virtio,file=/tmp/ubuntu.raw -monitor stdio
> >>
> >>Before:
> >>virtio0: rd_bytes=150979072 wr_bytes=2591138816 rd_operations=18475 
> >>wr_operations=69216
> >>          flush_operations=15343 wr_total_time_ns=26969283701 
> >> rd_total_time_ns=1018449432
> >>          flush_total_time_ns=562596819 rd_merged=0 wr_merged=10453
> >>
> >>After:
> >>virtio0: rd_bytes=148112896 wr_bytes=2845834240 rd_operations=17535 
> >>wr_operations=72197
> >>          flush_operations=15760 wr_total_time_ns=26104971623 
> >> rd_total_time_ns=1012926283
> >>          flush_total_time_ns=564414752 rd_merged=517 wr_merged=9859
> >>
> >>Some first numbers of improved read performance while booting:
> >>
> >>The Ubuntu 14.04.1 vServer from above:
> >>virtio0: rd_bytes=86158336 wr_bytes=688128 rd_operations=5851 
> >>wr_operations=70
> >>          flush_operations=4 wr_total_time_ns=10886705 
> >> rd_total_time_ns=416688595
> >>          flush_total_time_ns=288776 rd_merged=1297 wr_merged=2
> >>
> >>Windows 2012R2:
> >>virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 
> >>wr_operations=360
> >>          flush_operations=68 wr_total_time_ns=34344992718 
> >> rd_total_time_ns=134386844669
> >>          flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
> >>
> >>Signed-off-by: Peter Lieven <address@hidden>
> >>---
> >>  hw/block/dataplane/virtio-blk.c |   10 +-
> >>  hw/block/virtio-blk.c           |  222 
> >> ++++++++++++++++++++++++---------------
> >>  include/hw/virtio/virtio-blk.h  |   23 ++--
> >>  3 files changed, 156 insertions(+), 99 deletions(-)
> >>
> >>diff --git a/hw/block/dataplane/virtio-blk.c 
> >>b/hw/block/dataplane/virtio-blk.c
> >>index 1222a37..aa4ad91 100644
> >>--- a/hw/block/dataplane/virtio-blk.c
> >>+++ b/hw/block/dataplane/virtio-blk.c
> >>@@ -96,9 +96,8 @@ static void handle_notify(EventNotifier *e)
> >>      event_notifier_test_and_clear(&s->host_notifier);
> >>      blk_io_plug(s->conf->conf.blk);
> >>      for (;;) {
> >>-        MultiReqBuffer mrb = {
> >>-            .num_writes = 0,
> >>-        };
> >>+        MultiReqBuffer mrb_rd = {};
> >>+        MultiReqBuffer mrb_wr = {.is_write = 1};
> >>          int ret;
> >>          /* Disable guest->host notifies to avoid unnecessary vmexits */
> >>@@ -117,10 +116,11 @@ static void handle_notify(EventNotifier *e)
> >>                                                          req->elem.in_num,
> >>                                                          req->elem.index);
> >>-            virtio_blk_handle_request(req, &mrb);
> >>+            virtio_blk_handle_request(req, &mrb_wr, &mrb_rd);
> >>          }
> >>-        virtio_submit_multiwrite(s->conf->conf.blk, &mrb);
> >>+        virtio_submit_multireq(s->conf->conf.blk, &mrb_wr);
> >>+        virtio_submit_multireq(s->conf->conf.blk, &mrb_rd);
> >>          if (likely(ret == -EAGAIN)) { /* vring emptied */
> >>              /* Re-enable guest->host notifies and stop processing the 
> >> vring.
> >>diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> >>index 490f961..f522bfc 100644
> >>--- a/hw/block/virtio-blk.c
> >>+++ b/hw/block/virtio-blk.c
> >>@@ -22,12 +22,15 @@
> >>  #include "dataplane/virtio-blk.h"
> >>  #include "migration/migration.h"
> >>  #include "block/scsi.h"
> >>+#include "block/block_int.h"
> >No. :-)
> >
> >We could either expose the information that you need through
> >BlockBackend, or wait until your automatic request splitting logic is
> >implemented in block.c and the information isn't needed here any more.
> 
> I will add sth to block.c and follow-up with the request splitting logic
> later. It will still make sense to now the limit here. Otherwise we
> merge something that we split again afterwards.

Yup, but then make it an official block layer interface instead of
including block_int.h in the device emulation.

> >
> >>  #ifdef __linux__
> >>  # include <scsi/sg.h>
> >>  #endif
> >>  #include "hw/virtio/virtio-bus.h"
> >>  #include "hw/virtio/virtio-access.h"
> >>+/* #define DEBUG_MULTIREQ */
> >>+
> >>  VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
> >>  {
> >>      VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
> >>@@ -88,6 +91,11 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
> >>      trace_virtio_blk_rw_complete(req, ret);
> >>+#ifdef DEBUG_MULTIREQ
> >>+    printf("virtio_blk_rw_complete p %p ret %d\n",
> >>+           req, ret);
> >>+#endif
> >In the final version, please use tracepoints instead of printfs.
> 
> of course. This is just my debug code.
> 
> >
> >>      if (ret) {
> >>          int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
> >>          bool is_read = !(p & VIRTIO_BLK_T_OUT);
> >>@@ -257,24 +265,63 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq 
> >>*req)
> >>      virtio_blk_free_request(req);
> >>  }
> >>-void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)
> >>+static void virtio_multireq_cb(void *opaque, int ret)
> >>+{
> >>+    MultiReqBuffer *mrb = opaque;
> >>+    int i;
> >>+#ifdef DEBUG_MULTIREQ
> >>+    printf("virtio_multireq_cb: p %p sector_num %ld nb_sectors %d is_write 
> >>%d num_reqs %d\n",
> >>+           mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, 
> >>mrb->num_reqs);
> >>+#endif
> >>+    for (i = 0; i < mrb->num_reqs; i++) {
> >>+        virtio_blk_rw_complete(mrb->reqs[i], ret);
> >>+    }
> >>+
> >>+    qemu_iovec_destroy(&mrb->qiov);
> >>+    g_free(mrb);
> >>+}
> >>+
> >>+void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb0)
> >>  {
> >>-    int i, ret;
> >>+    MultiReqBuffer *mrb = NULL;
> >>-    if (!mrb->num_writes) {
> >>+    if (!mrb0->num_reqs) {
> >>          return;
> >>      }
> >>-    ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes);
> >>-    if (ret != 0) {
> >>-        for (i = 0; i < mrb->num_writes; i++) {
> >>-            if (mrb->blkreq[i].error) {
> >>-                virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO);
> >>-            }
> >>+    if (mrb0->num_reqs == 1) {
> >>+        if (mrb0->is_write) {
> >>+            blk_aio_writev(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, 
> >>mrb0->nb_sectors,
> >>+                           virtio_blk_rw_complete, mrb0->reqs[0]);
> >>+        } else {
> >>+            blk_aio_readv(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, 
> >>mrb0->nb_sectors,
> >>+                          virtio_blk_rw_complete, mrb0->reqs[0]);
> >>          }
> >>+        qemu_iovec_destroy(&mrb0->qiov);
> >>+        mrb0->num_reqs = 0;
> >>+        return;
> >>+    }
> >>+
> >>+    mrb = g_malloc(sizeof(MultiReqBuffer));
> >>+    memcpy(mrb, mrb0, sizeof(MultiReqBuffer));
> >>+    mrb0->num_reqs = 0;
> >We probably want to avoid allocations as much as we can. If at all
> >possible, we don't want to memcpy() either.
> >
> >Most parts of MultiReqBuffer don't seem to be actually needed during the
> >request and in the callback any more. Essentially, all that is needed is
> >the qiov and a way to find all the requests that belong to the same
> >MultiReqBuffer, so that they can all be completed.
> >
> >We could create a linked list by having a VirtIOBlockReq *next in the
> >request struct. For the qiov we can probably just use the field in the
> >first request (i.e. we extend the qiov of the first request).
> 
> Thats a good idea and should work. I will look into this.
> But where would you store the merge qiov?
> 
> It looks like I would need a small struct which holds the qiov and the 
> pointer to req[0].
> This can then be used as opaque value to the callback.

My thought was that you replace the original qiov of the first request
with the merged one. I don't think you need the original one any more
after merging the requests.  Alternatively, you could use a separate
field in VirtIOBlockReq which stays unused in all but the first request.

In any case the goal is to get rid of a dynamically allocated struct, no
matter how small, so opaque must be directly pointing to a request in
the end.

> >With these requirements removed, MultiReqBuffer can live on the stack
> >and doesn't need to be copied to the heap.
> >
> >
> >With the additional patch that you send me, the next few lines read:
> >
> >+    qemu_iovec_init(&mrb->qiov, mrb->num_reqs);
> >+
> >+    for (i = 0; i < mrb->num_reqs; i++) {
> >+        qemu_iovec_concat(&mrb->qiov, &mrb->reqs[i]->qiov, 0, 
> >mrb->reqs[i]->qiov.size);
> >+    }
> >+    assert(mrb->nb_sectors == mrb->qiov.size / BDRV_SECTOR_SIZE);
> >
> >The long line is more than 80 characters, but probably more importantly,
> >mrb->num_reqs is a really bad estimation for niov. If the guest uses any
> >vectored I/O, we're sure to get reallocations. What you really want to
> >use here is the old mrb->qiov.niov (which is an abuse to store in the
> >qiov, see below).
> 
> Of course, my fault. I was so happy to have an exact value for the number
> of niov that I put in the wrong variable ;-)

Happens. :-)

> >(I also wonder whether it would make sense to keep qiovs allocated when
> >putting VirtIOBlockReqs into the pool, but that's not for this series.)
> >
> >>+
> >>+#ifdef DEBUG_MULTIREQ
> >>+    printf("virtio_submit_multireq: p %p sector_num %ld nb_sectors %d 
> >>is_write %d num_reqs %d\n",
> >>+           mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, 
> >>mrb->num_reqs);
> >>+#endif
> >>+
> >>+    if (mrb->is_write) {
> >>+        blk_aio_writev(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors,
> >>+                       virtio_multireq_cb, mrb);
> >>+    } else {
> >>+        blk_aio_readv(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors,
> >>+                      virtio_multireq_cb, mrb);
> >>      }
> >>-    mrb->num_writes = 0;
> >>+    block_acct_merge_done(blk_get_stats(blk),
> >>+                          mrb->is_write ? BLOCK_ACCT_WRITE : 
> >>BLOCK_ACCT_READ,
> >>+                          mrb->num_reqs - 1);
> >>  }
> >>  static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer 
> >> *mrb)

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]