[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu-img convert cache mode for source
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] qemu-img convert cache mode for source |
Date: |
Mon, 3 Mar 2014 11:38:43 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Am 28.02.2014 um 15:35 hat Peter Lieven geschrieben:
> On 27.02.2014 09:57, Stefan Hajnoczi wrote:
> >On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
> >>On 26.02.2014 16:41, Stefan Hajnoczi wrote:
> >>>On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
> >>>>I was wondering if it would be a good idea to set the O_DIRECT mode for
> >>>>the source
> >>>>files of a qemu-img convert process if the source is a host_device?
> >>>>
> >>>>Currently the backup of a host device is polluting the page cache.
> >>>Points to consider:
> >>>
> >>>1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
> >>> the file. A fallback is necessary.
> >>>
> >>>2. O_DIRECT has no readahead so performance could actually decrease.
> >>> The question is, how important is reahead versus polluting page
> >>> cache?
> >>>
> >>>3. For raw files it would make sense to tell the kernel that access is
> >>> sequential and data will be used only once. Then we can get the best
> >>> of both worlds (avoid polluting page cache but still get readahead).
> >>> This is done using posix_fadvise(2).
> >>>
> >>> The problem is what to do for image formats. An image file can be
> >>> very fragmented so the readahead might not be a win. Does this mean
> >>> that for image formats we should tell the kernel access will be
> >>> random?
> >>>
> >>> Furthermore, maybe it's best to do readahead inside QEMU so that even
> >>> network protocols (nbd, iscsi, etc) can get good performance. They
> >>> act like O_DIRECT is always on.
> >>your comments are regarding qemu-img convert, right?
> >>How would you implement this? A new open flag because
> >>the fadvise had to goto inside the protocol driver.
> >>
> >>I would start with host_devices first and see how it performs there.
> >>
> >>For qemu-img convert I would issue a FADV_DONTNEED after
> >>a write for the bytes that have been written
> >>(i have tested this with Linux and it seems to work quite well).
> >>
> >>Question is, what is the right paramter for reads? Also FADV_DONTNEED?
> >I think so but this should be justified with benchmark results.
>
> I ran some benchmarks at found that a FADV_DONTNEED issues after
> a read does not hurt regarding to performance. But it avoids buffers
> increasing while I read from a host_device of raw file.
Okay, sounds reasonable.
> As for writing it does only work if I issue a fdatasync after each write, but
> this should be equivalent to O_DIRECT. So I would keep the patch
> to support qemu-img convert sources if they are host_device or file.
Doing an fdatasync() is not an option (and not equivalent to O_DIRECT
at all).
> Here is a proposal for a patch:
>
> diff --git a/block.c b/block.c
> index 2fd5482..2445433 100644
> --- a/block.c
> +++ b/block.c
> @@ -2626,6 +2626,14 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t
> offset,
> qemu_aio_wait();
> }
> }
> +
> +#ifdef POSIX_FADV_DONTNEED
> + if (!rwco.ret && bs->open_flags & BDRV_O_SEQUENTIAL &&
> + bs->drv->bdrv_fadvise && !is_write) {
> + bs->drv->bdrv_fadvise(bs, offset, qiov->size, POSIX_FADV_DONTNEED);
> + }
> +#endif
> +
This #ifdef should be in the raw-posix driver. Please try to keep the
qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and
friends as an implementation detail of block drivers.
> diff --git a/include/block/block.h b/include/block/block.h
> index 780f48b..a4dcc3c 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -105,6 +105,9 @@ typedef enum {
> #define BDRV_O_PROTOCOL 0x8000 /* if no block driver is explicitly given:
> select an appropriate protocol driver,
> ignoring the format layer */
> +#define BDRV_O_SEQUENTIAL 0x10000 /* open device for sequential read/write
> */
> +
> +
>
> #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB |
> BDRV_O_NO_FLUSH)
Why two additional newlines?
BDRV_O_SEQUENTIAL works for me as the external interface.
Kevin
- Re: [Qemu-devel] qemu-img convert cache mode for source,
Kevin Wolf <=