qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] blk: Add discard=sparse mode


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH] blk: Add discard=sparse mode
Date: Mon, 27 Feb 2017 17:12:47 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1

Hi,

On 27.02.2017 01:45, Samuel Thibault wrote:
> By default, on discard requests, the posix block backend punches holes but
> re-fallocates them to keep the allocated size intact. In some situations
> it is however convenient, when using sparse disk images, to see disk image
> sizes shrink on discard requests.
> 
> This commit adds a discard=sparse mode which does this, by disabling the
> fallocate call.
> 
> Signed-off-by: Samuel Thibault <address@hidden>
> 
> diff --git a/block.c b/block.c
> index b663204f3f..e9cd83210a 100644
> --- a/block.c
> +++ b/block.c
> @@ -665,12 +665,14 @@ static void bdrv_join_options(BlockDriverState *bs, 
> QDict *options,
>   */
>  int bdrv_parse_discard_flags(const char *mode, int *flags)
>  {
> -    *flags &= ~BDRV_O_UNMAP;
> +    *flags &= ~(BDRV_O_UNMAP | BDRV_O_SPARSE);
>  
>      if (!strcmp(mode, "off") || !strcmp(mode, "ignore")) {
>          /* do nothing */
>      } else if (!strcmp(mode, "on") || !strcmp(mode, "unmap")) {
>          *flags |= BDRV_O_UNMAP;
> +    } else if (!strcmp(mode, "sparse")) {
> +        *flags |= BDRV_O_UNMAP | BDRV_O_SPARSE;
>      } else {
>          return -1;
>      }
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 4de1abd023..f9efadc5be 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1057,6 +1057,10 @@ static ssize_t 
> handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>      BDRVRawState *s = aiocb->bs->opaque;
>  #endif
>  
> +#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || defined(CONFIG_FALLOCATE)
> +    int open_flags = aiocb->bs->open_flags;
> +#endif
> +
>      if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
>          return handle_aiocb_write_zeroes_block(aiocb);
>      }
> @@ -1079,7 +1083,7 @@ static ssize_t 
> handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>  #endif
>  
>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> -    if (s->has_discard && s->has_fallocate) {
> +    if (s->has_discard && (s->has_fallocate || open_flags & BDRV_O_SPARSE)) {

s->has_fallocate has a meaning. I wouldn't try to call do_fallocate() if
s->has_fallocate is false. Therefore, I consider this to effectively be
a no-op.

>          int ret = do_fallocate(s->fd,
>                                 FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>                                 aiocb->aio_offset, aiocb->aio_nbytes);
> @@ -1098,7 +1102,8 @@ static ssize_t 
> handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>  #endif
>  
>  #ifdef CONFIG_FALLOCATE
> -    if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
> +    if (s->has_fallocate && !(open_flags & BDRV_O_SPARSE)
> +            && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {

First, this part is only invoked if everything before it has failed.
Second, this condition will only be true if we try to write zeroes
beyond the end of the file. This is a pretty unusual request that
normally doesn't happen (it may happen during qcow2 image creation with
preallocation or something like that).

This part here is just a final fallback for growing files on systems
which do not have any advanced fallocate() modes. It seems very unlikely
to me that anyone would hit this in normal operation.

So all in all I don't know how this patch changes anything.


Also, this function is for writing zeroes, not for handling discards.
That is done in handle_aiocb_discard().

And as far as I can tell, handle_aiocb_discard() does exactly what you
want it to do. It's invoked by raw_aio_pdiscard() and also by
raw_co_pwrite_zeroes() if BDRV_REQ_MAY_UNMAP is set. If that flag is not
set, raw_co_pwrite_zeroes() will use the above zero-writing function.

Unless I'm mistaken, unmap/trim requests from the guest should result in
a discard request in the block layer. This should always trigger
handle_aiocb_discard() here and that should do what you want it to.

I don't know exactly what you are doing so maybe for some reason the
request doesn't arrive as a discard but as a write-zeroes. As I said,
even if that is so, I don't see how this patch then changes the behavior
when compared to discard=unmap.

What might make sense is to make BDRV_O_SPARSE always set
BDRV_REQ_MAY_UNMAP for any zero-write request. But I don't know why that
would be necessary. With virtio-scsi, discard requests from the guest
should result in discard requests in the block layer anyway. And
detect-zeroes=unmap does set BDRV_REQ_MAY_UNMAP for the write-zeroes
requests it generates.


Could you maybe give me the configuration that results in the issue
you're describing in the commit message?

Max

>          int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, 
> aiocb->aio_nbytes);
>          if (ret == 0 || ret != -ENOTSUP) {
>              return ret;
> diff --git a/include/block/block.h b/include/block/block.h
> index bde5ebda18..103313bee0 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -97,6 +97,8 @@ typedef struct HDGeometry {
>                                        select an appropriate protocol driver,
>                                        ignoring the format layer */
>  #define BDRV_O_NO_IO       0x10000 /* don't initialize for I/O */
> +#define BDRV_O_SPARSE      0x20000 /* make guest UNMAP/TRIM operations make 
> image
> +                                      possibly sparse */
>  
>  #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
>  
> diff --git a/qemu-nbd.texi b/qemu-nbd.texi
> index 9a84e81eed..8df4f58ddc 100644
> --- a/qemu-nbd.texi
> +++ b/qemu-nbd.texi
> @@ -63,8 +63,8 @@ and @samp{native} (Linux only).
>  @item address@hidden
>  Control whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
>  requests are ignored or passed to the filesystem.  @var{discard} is one of
> address@hidden (or @samp{off}), @samp{unmap} (or @samp{on}).  The default is
> address@hidden
> address@hidden (or @samp{off}), @samp{unmap} (or @samp{on}), or @samp{sparse}.
> +The default is @samp{ignore}.
>  @item address@hidden
>  Control the automatic conversion of plain zero writes by the OS to
>  driver-specific optimized zero write commands.  @var{detect-zeroes} is one of
> diff --git a/qemu-options.hx b/qemu-options.hx
> index bf458f83c3..f5c7455fd1 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -539,7 +539,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>      "       [,serial=s][,addr=A][,rerror=ignore|stop|report]\n"
>      "       
> [,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n"
>      "       [,readonly=on|off][,copy-on-read=on|off]\n"
> -    "       [,discard=ignore|unmap][,detect-zeroes=on|off|unmap]\n"
> +    "       [,discard=ignore|unmap|sparse][,detect-zeroes=on|off|unmap]\n"
>      "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]]\n"
>      "       [[,iops=i]|[[,iops_rd=r][,iops_wr=w]]]\n"
>      "       [[,bps_max=bm]|[[,bps_rd_max=rm][,bps_wr_max=wm]]]\n"
> @@ -582,7 +582,7 @@ These options have the same definition as they have in 
> @option{-hdachs}.
>  @item address@hidden
>  @var{aio} is "threads", or "native" and selects between pthread based disk 
> I/O and native Linux AIO.
>  @item address@hidden
> address@hidden is one of "ignore" (or "off") or "unmap" (or "on") and 
> controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap}) 
> requests are ignored or passed to the filesystem.  Some machine types may not 
> support discard requests.
> address@hidden is one of "ignore" (or "off") or "unmap" (or "on") or "sparse" 
> and controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap}) 
> requests are ignored or passed to the filesystem.  Some machine types may not 
> support discard requests.
>  @item address@hidden
>  Specify which disk @var{format} will be used rather than detecting
>  the format.  Can be used to specify format=raw to avoid interpreting
> 


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]