qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu-img convert cache mode for source


From: Peter Lieven
Subject: Re: [Qemu-devel] qemu-img convert cache mode for source
Date: Mon, 03 Mar 2014 12:20:54 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

On 03.03.2014 11:38, Kevin Wolf wrote:
Am 28.02.2014 um 15:35 hat Peter Lieven geschrieben:
On 27.02.2014 09:57, Stefan Hajnoczi wrote:
On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
On 26.02.2014 16:41, Stefan Hajnoczi wrote:
On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
I was wondering if it would be a good idea to set the O_DIRECT mode for the 
source
files of a qemu-img convert process if the source is a host_device?

Currently the backup of a host device is polluting the page cache.
Points to consider:

1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
    the file.  A fallback is necessary.

2. O_DIRECT has no readahead so performance could actually decrease.
    The question is, how important is reahead versus polluting page
    cache?

3. For raw files it would make sense to tell the kernel that access is
    sequential and data will be used only once.  Then we can get the best
    of both worlds (avoid polluting page cache but still get readahead).
    This is done using posix_fadvise(2).

    The problem is what to do for image formats.  An image file can be
    very fragmented so the readahead might not be a win.  Does this mean
    that for image formats we should tell the kernel access will be
    random?

    Furthermore, maybe it's best to do readahead inside QEMU so that even
    network protocols (nbd, iscsi, etc) can get good performance.  They
    act like O_DIRECT is always on.
your comments are regarding qemu-img convert, right?
How would you implement this? A new open flag because
the fadvise had to goto inside the protocol driver.

I would start with host_devices first and see how it performs there.

For qemu-img convert I would issue a FADV_DONTNEED after
a write for the bytes that have been written
(i have tested this with Linux and it seems to work quite well).

Question is, what is the right paramter for reads? Also FADV_DONTNEED?
I think so but this should be justified with benchmark results.
I ran some benchmarks at found that a FADV_DONTNEED issues after
a read does not hurt regarding to performance. But it avoids buffers
increasing while I read from a host_device of raw file.
Okay, sounds reasonable.

As for writing it does only work if I issue a fdatasync after each write, but
this should be equivalent to O_DIRECT. So I would keep the patch
to support qemu-img convert sources if they are host_device or file.
Doing an fdatasync() is not an option (and not equivalent to O_DIRECT
at all).
of course, fdatasync is no option. I just wanted to point out that one
should use O_DIRECT if the pagecache should be disable for the output
file.

Here is a proposal for a patch:

diff --git a/block.c b/block.c
index 2fd5482..2445433 100644
--- a/block.c
+++ b/block.c
@@ -2626,6 +2626,14 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
offset,
              qemu_aio_wait();
          }
      }
+
+#ifdef POSIX_FADV_DONTNEED
+    if (!rwco.ret && bs->open_flags & BDRV_O_SEQUENTIAL &&
+        bs->drv->bdrv_fadvise && !is_write) {
+        bs->drv->bdrv_fadvise(bs, offset, qiov->size, POSIX_FADV_DONTNEED);
+    }
+#endif
+
This #ifdef should be in the raw-posix driver. Please try to keep the
qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and
friends as an implementation detail of block drivers.
I had the same idee, but as far as I see the callback to the completes
request is handled in block.c. raw-posix uses the aio interface and
not coroutines.

diff --git a/include/block/block.h b/include/block/block.h
index 780f48b..a4dcc3c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -105,6 +105,9 @@ typedef enum {
  #define BDRV_O_PROTOCOL    0x8000  /* if no block driver is explicitly given:
                                        select an appropriate protocol driver,
                                        ignoring the format layer */
+#define BDRV_O_SEQUENTIAL 0x10000  /* open device for sequential read/write */
+
+

  #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | 
BDRV_O_NO_FLUSH)
Why two additional newlines?
typo. this patch war basically an RFC :-)

BDRV_O_SEQUENTIAL works for me as the external interface.

Kevin

Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]