qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_z


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_zeroes
Date: Thu, 15 May 2014 11:54:09 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 15.05.2014 um 07:16 hat Peter Lieven geschrieben:
> Am 14.05.2014 13:41, schrieb Kevin Wolf:
> > Am 08.05.2014 um 18:22 hat Peter Lieven geschrieben:
> >> this patch tries to optimize zero write requests
> >> by automatically using bdrv_write_zeroes if it is
> >> supported by the format.
> >>
> >> This significantly speeds up file system initialization and
> >> should speed zero write test used to test backend storage
> >> performance.
> >>
> >> I ran the following 2 tests on my internal SSD with a
> >> 50G QCOW2 container and on an attached iSCSI storage.
> >>
> >> a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX
> >>
> >> QCOW2         [off]     [on]     [unmap]
> >> -----
> >> runtime:       14secs    1.1secs  1.1secs
> >> filesize:      937M      18M      18M
> >>
> >> iSCSI         [off]     [on]     [unmap]
> >> ----
> >> runtime:       9.3s      0.9s     0.9s
> >>
> >> b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct
> >>
> >> QCOW2         [off]     [on]     [unmap]
> >> -----
> >> runtime:       246secs   18secs   18secs
> >> filesize:      51G       192K     192K
> >> throughput:    203M/s    2.3G/s   2.3G/s
> >>
> >> iSCSI*        [off]     [on]     [unmap]
> >> ----
> >> runtime:       8mins     45secs   33secs
> >> throughput:    106M/s    1.2G/s   1.6G/s
> >> allocated:     100%      100%     0%
> >>
> >> * The storage was connected via an 1Gbit interface.
> >>   It seems to internally handle writing zeroes
> >>   via WRITESAME16 very fast.
> >>
> >> Signed-off-by: Peter Lieven <address@hidden>
> >> ---
> >> v3->v4: - use QAPI generated enum and lookup table [Kevin]
> >>         - added more details about the options in the comments
> >>           of the qapi-schema [Eric]
> >>         - changed the type of detect_zeroes from str to
> >>           BlockdevDetectZeroesOptions. I left the name
> >>           as is because it is consistent with e.g.
> >>           BlockdevDiscardOptions or BlockdevAioOptions [Eric]
> >>         - changed the parse function in blockdev_init to
> >>           be generic usable for other enum parameters
> >>         
> >> v2->v3: - moved parameter parsing to blockdev_init
> >>         - added per device detect_zeroes status to 
> >>           hmp (info block -v) and qmp (query-block) [Eric]
> >>         - added support to enable detect-zeroes also
> >>           for hot added devices [Eric]
> >>         - added missing entry to qemu_common_drive_opts
> >>         - fixed description of qemu_iovec_is_zero [Fam]
> >>
> >> v1->v2: - added tests to commit message (Markus)
> >> RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric)
> >>            - fixed typo (choosen->chosen) (Eric)
> >>            - added second example to commit msg
> >>
> >> RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline 
> >> parameter
> >>               - call zero detection only for format (bs->file != NULL)
> >>
> >>  block.c                   |   11 ++++++++++
> >>  block/qapi.c              |    1 +
> >>  blockdev.c                |   34 +++++++++++++++++++++++++++++
> >>  hmp.c                     |    5 +++++
> >>  include/block/block_int.h |    1 +
> >>  include/qemu-common.h     |    1 +
> >>  qapi-schema.json          |   52 
> >> ++++++++++++++++++++++++++++++++-------------
> >>  qemu-options.hx           |    6 ++++++
> >>  qmp-commands.hx           |    3 +++
> >>  util/iov.c                |   21 ++++++++++++++++++
> >>  10 files changed, 120 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/block.c b/block.c
> >> index b749d31..aea4c77 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -3244,6 +3244,17 @@ static int coroutine_fn 
> >> bdrv_aligned_pwritev(BlockDriverState *bs,
> >>  
> >>      ret = notifier_with_return_list_notify(&bs->before_write_notifiers, 
> >> req);
> >>  
> >> +    if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
> >> +        !(flags & BDRV_REQ_ZERO_WRITE) && bs->file &&
> >> +        drv->bdrv_co_write_zeroes && qemu_iovec_is_zero(qiov)) {
> > Pretty long condition. :-)
> >
> > Looks like most is obviously needed, but I wonder what the bs->file part
> > is good for? That looks rather arbitrary.
> 
> What I wanted to achieve is that this condition is only true if we handle
> the format (e.g. raw, qcow2, vmdk etc.). If e.g. qcow2 then sends a
> zero write this should always end on the disk and should not be optimizable.

But why? This means setting an arbitrary policy for no good reason. You
already have an option, and it already defaults to off, so unless
someone specifically enables it for bs->file, we don't do the
optimisation. But if someone wants to have it on bs->file, what reason
is there to ignore that request?

> >> +        flags |= BDRV_REQ_ZERO_WRITE;
> >> +        /* if the device was not opened with discard=on the below flag
> >> +         * is immediately cleared again in bdrv_co_do_write_zeroes */
> > Is it? I only see it being cleared in bdrv_co_write_zeroes(), but that's
> > not a function that seems to be called from here.
> 
> You are right. Question, do we want to support detect_zeroes = unmap
> if discard = ignore? If yes, I have to update the docs. Otherwise
> I have to check for BDRV_O_DISCARD before setting BDRV_REQ_MAY_UNMAP.

I think it would be reasonable enough to just error out when you try to
open an image with detect_zeroes=unmap,discard=ignore.

Can these flags be changed during runtime? If so, we need to check there
as well.

> >> +        if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
> >> +            flags |= BDRV_REQ_MAY_UNMAP;
> >> +        }
> >> +    }
> >> +
> >>      if (ret < 0) {
> >>          /* Do nothing, write notifier decided to fail this request */
> >>      } else if (flags & BDRV_REQ_ZERO_WRITE) {

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]