qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option


From: Chun Yan Liu
Subject: Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
Date: Thu, 26 Jun 2014 21:36:37 -0600

Hi, Stefan & Kevin,

Could you help to have a look at this version? We've discussed about
this last November and now switch it to QemuOpts.

Thanks,
Chunyan

>>> On 6/23/2014 at 05:17 PM, in message
<address@hidden>, Chunyan Liu
<address@hidden> wrote: 
> Add 'nocow' option so that users could have a chance to set NOCOW flag to 
> newly created files. It's useful on btrfs file system to enhance  
> performance. 
>  
> Btrfs has low performance when hosting VM images, even more when the guest 
> in those VM are also using btrfs as file system. One way to mitigate this  
> bad 
> performance is to turn off COW attributes on VM files. Generally, there are 
> two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then 
> all newly created files will be NOCOW. b) per file. Add the NOCOW file 
> attribute. It could only be done to empty or new files. 
>  
> This patch tries the second way, according to the option, it could add NOCOW 
> per file. 
>  
> For most block drivers, since the create file step is in raw-posix.c, so we 
> can do setting NOCOW flag ioctl in raw-posix.c only. 
>  
> But there are some exceptions, like block/vpc.c and block/vdi.c, they are 
> creating file by calling qemu_open directly. For them, do the same setting 
> NOCOW flag ioctl work in them separately. 
>  
> Signed-off-by: Chunyan Liu <address@hidden> 
> --- 
> Changes to v2: 
>   * based on QemuOpts instead of old QEMUOptionParameters 
>   * add nocow description in man page and html doc 
>  
>   Old v2 is here: 
>   http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html 
>  
> --- 
>  block/cow.c               |  5 +++++ 
>  block/qcow.c              |  5 +++++ 
>  block/qcow2.c             |  5 +++++ 
>  block/qed.c               | 11 ++++++++--- 
>  block/raw-posix.c         | 25 +++++++++++++++++++++++++ 
>  block/vdi.c               | 29 +++++++++++++++++++++++++++++ 
>  block/vhdx.c              |  5 +++++ 
>  block/vmdk.c              | 11 ++++++++--- 
>  block/vpc.c               | 29 +++++++++++++++++++++++++++++ 
>  include/block/block_int.h |  1 + 
>  qemu-doc.texi             | 16 ++++++++++++++++ 
>  qemu-img.texi             | 16 ++++++++++++++++ 
>  12 files changed, 152 insertions(+), 6 deletions(-) 
>  
> diff --git a/block/cow.c b/block/cow.c 
> index a05a92c..43b537c 100644 
> --- a/block/cow.c 
> +++ b/block/cow.c 
> @@ -401,6 +401,11 @@ static QemuOptsList cow_create_opts = { 
>              .type = QEMU_OPT_STRING, 
>              .help = "File name of a base image" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/qcow.c b/block/qcow.c 
> index 1f2bac8..5b23540 100644 
> --- a/block/qcow.c 
> +++ b/block/qcow.c 
> @@ -928,6 +928,11 @@ static QemuOptsList qcow_create_opts = { 
>              .help = "Encrypt the image", 
>              .def_value_str = "off" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/qcow2.c b/block/qcow2.c 
> index b9d2fa6..3a4cc8a 100644 
> --- a/block/qcow2.c 
> +++ b/block/qcow2.c 
> @@ -2382,6 +2382,11 @@ static QemuOptsList qcow2_create_opts = { 
>              .help = "Postpone refcount updates", 
>              .def_value_str = "off" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/qed.c b/block/qed.c 
> index 092e6fb..460ac92 100644 
> --- a/block/qed.c 
> +++ b/block/qed.c 
> @@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs) 
>  static int qed_create(const char *filename, uint32_t cluster_size, 
>                        uint64_t image_size, uint32_t table_size, 
>                        const char *backing_file, const char *backing_fmt, 
> -                      Error **errp) 
> +                      QemuOpts *opts, Error **errp) 
>  { 
>      QEDHeader header = { 
>          .magic = QED_MAGIC, 
> @@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t  
> cluster_size, 
>      int ret = 0; 
>      BlockDriverState *bs; 
>   
> -    ret = bdrv_create_file(filename, NULL, &local_err); 
> +    ret = bdrv_create_file(filename, opts, &local_err); 
>      if (ret < 0) { 
>          error_propagate(errp, local_err); 
>          return ret; 
> @@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts 
>  
> *opts, Error **errp) 
>      } 
>   
>      ret = qed_create(filename, cluster_size, image_size, table_size, 
> -                     backing_file, backing_fmt, errp); 
> +                     backing_file, backing_fmt, opts, errp); 
>   
>  finish: 
>      g_free(backing_file); 
> @@ -1644,6 +1644,11 @@ static QemuOptsList qed_create_opts = { 
>              .type = QEMU_OPT_SIZE, 
>              .help = "L1/L2 table size (in clusters)" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/raw-posix.c b/block/raw-posix.c 
> index dacf4fb..825a0c8 100644 
> --- a/block/raw-posix.c 
> +++ b/block/raw-posix.c 
> @@ -55,6 +55,9 @@ 
>  #include <linux/cdrom.h> 
>  #include <linux/fd.h> 
>  #include <linux/fs.h> 
> +#ifndef FS_NOCOW_FL 
> +#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */ 
> +#endif 
>  #endif 
>  #ifdef CONFIG_FIEMAP 
>  #include <linux/fiemap.h> 
> @@ -1278,12 +1281,14 @@ static int raw_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      int fd; 
>      int result = 0; 
>      int64_t total_size = 0; 
> +    bool nocow = false; 
>   
>      strstart(filename, "file:", &filename); 
>   
>      /* Read out options */ 
>      total_size = 
>          qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE; 
> +    nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false); 
>   
>      fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 
>                     0644); 
> @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>          result = -errno; 
>          error_setg_errno(errp, -result, "Could not create file"); 
>      } else { 
> +        if (nocow) { 
> +#ifdef __linux__ 
> +            /* Set NOCOW flag to solve performance issue on fs like btrfs. 
> +             * This is an optimisation. The FS_IOC_SETFLAGS ioctl return  
> value 
> +             * will be ignored since any failure of this operation should  
> not 
> +             * block the left work. 
> +             */ 
> +            int attr; 
> +            if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { 
> +                attr |= FS_NOCOW_FL; 
> +                ioctl(fd, FS_IOC_SETFLAGS, &attr); 
> +            } 
> +#endif 
> +        } 
> + 
>          if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) { 
>              result = -errno; 
>              error_setg_errno(errp, -result, "Could not resize file"); 
> @@ -1477,6 +1497,11 @@ static QemuOptsList raw_create_opts = { 
>              .type = QEMU_OPT_SIZE, 
>              .help = "Virtual disk size" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/vdi.c b/block/vdi.c 
> index 01fe22e..197bd77 100644 
> --- a/block/vdi.c 
> +++ b/block/vdi.c 
> @@ -53,6 +53,13 @@ 
>  #include "block/block_int.h" 
>  #include "qemu/module.h" 
>  #include "migration/migration.h" 
> +#ifdef __linux__ 
> +#include <linux/fs.h> 
> +#include <sys/ioctl.h> 
> +#ifndef FS_NOCOW_FL 
> +#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */ 
> +#endif 
> +#endif 
>   
>  #if defined(CONFIG_UUID) 
>  #include <uuid/uuid.h> 
> @@ -683,6 +690,7 @@ static int vdi_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      VdiHeader header; 
>      size_t i; 
>      size_t bmap_size; 
> +    bool nocow = false; 
>   
>      logout("\n"); 
>   
> @@ -699,6 +707,7 @@ static int vdi_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>          image_type = VDI_TYPE_STATIC; 
>      } 
>  #endif 
> +    nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false); 
>   
>      if (bytes > VDI_DISK_SIZE_MAX) { 
>          result = -ENOTSUP; 
> @@ -716,6 +725,21 @@ static int vdi_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>          goto exit; 
>      } 
>   
> +    if (nocow) { 
> +#ifdef __linux__ 
> +        /* Set NOCOW flag to solve performance issue on fs like btrfs. 
> +         * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value  
> will 
> +         * be ignored since any failure of this operation should not block  
> the 
> +         * left work. 
> +         */ 
> +        int attr; 
> +        if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { 
> +            attr |= FS_NOCOW_FL; 
> +            ioctl(fd, FS_IOC_SETFLAGS, &attr); 
> +        } 
> +#endif 
> +    } 
> + 
>      /* We need enough blocks to store the given disk size, 
>         so always round up. */ 
>      blocks = (bytes + block_size - 1) / block_size; 
> @@ -818,6 +842,11 @@ static QemuOptsList vdi_create_opts = { 
>              .def_value_str = "off" 
>          }, 
>  #endif 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          /* TODO: An additional option to set UUID values might be useful.  
> */ 
>          { /* end of list */ } 
>      } 
> diff --git a/block/vhdx.c b/block/vhdx.c 
> index fedcf9f..7bdb456 100644 
> --- a/block/vhdx.c 
> +++ b/block/vhdx.c 
> @@ -1909,6 +1909,11 @@ static QemuOptsList vhdx_create_opts = { 
>             .type = QEMU_OPT_BOOL, 
>             .help = "Force use of payload blocks of type 'ZERO'.   
> Non-standard." 
>         }, 
> +       { 
> +           .name = BLOCK_OPT_NOCOW, 
> +           .type = QEMU_OPT_BOOL, 
> +           .help = "Turn off copy-on-write (valid only on btrfs)" 
> +       }, 
>         { NULL } 
>      } 
>  }; 
> diff --git a/block/vmdk.c b/block/vmdk.c 
> index 83dd6fe..94e1ff7 100644 
> --- a/block/vmdk.c 
> +++ b/block/vmdk.c 
> @@ -1529,7 +1529,7 @@ static int coroutine_fn  
> vmdk_co_write_zeroes(BlockDriverState *bs, 
>   
>  static int vmdk_create_extent(const char *filename, int64_t filesize, 
>                                bool flat, bool compress, bool zeroed_grain, 
> -                              Error **errp) 
> +                              QemuOpts *opts, Error **errp) 
>  { 
>      int ret, i; 
>      BlockDriverState *bs = NULL; 
> @@ -1539,7 +1539,7 @@ static int vmdk_create_extent(const char *filename,  
> int64_t filesize, 
>      uint32_t *gd_buf = NULL; 
>      int gd_buf_size; 
>   
> -    ret = bdrv_create_file(filename, NULL, &local_err); 
> +    ret = bdrv_create_file(filename, opts, &local_err); 
>      if (ret < 0) { 
>          error_propagate(errp, local_err); 
>          goto exit; 
> @@ -1845,7 +1845,7 @@ static int vmdk_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>                  path, desc_filename); 
>   
>          if (vmdk_create_extent(ext_filename, size, 
> -                               flat, compress, zeroed_grain, errp)) { 
> +                               flat, compress, zeroed_grain, opts, errp)) { 
>              ret = -EINVAL; 
>              goto exit; 
>          } 
> @@ -2153,6 +2153,11 @@ static QemuOptsList vmdk_create_opts = { 
>              .help = "Enable efficient zero writes " 
>                      "using the zeroed-grain GTE feature" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/vpc.c b/block/vpc.c 
> index 798d854..8b376a4 100644 
> --- a/block/vpc.c 
> +++ b/block/vpc.c 
> @@ -29,6 +29,13 @@ 
>  #if defined(CONFIG_UUID) 
>  #include <uuid/uuid.h> 
>  #endif 
> +#ifdef __linux__ 
> +#include <linux/fs.h> 
> +#include <sys/ioctl.h> 
> +#ifndef FS_NOCOW_FL 
> +#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */ 
> +#endif 
> +#endif 
>   
>  /**************************************************************/ 
>   
> @@ -751,6 +758,7 @@ static int vpc_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      int64_t total_size; 
>      int disk_type; 
>      int ret = -EIO; 
> +    bool nocow = false; 
>   
>      /* Read out options */ 
>      total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0); 
> @@ -767,6 +775,7 @@ static int vpc_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      } else { 
>          disk_type = VHD_DYNAMIC; 
>      } 
> +    nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false); 
>   
>      /* Create the file */ 
>      fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,  
> 0644); 
> @@ -775,6 +784,21 @@ static int vpc_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>          goto out; 
>      } 
>   
> +    if (nocow) { 
> +#ifdef __linux__ 
> +        /* Set NOCOW flag to solve performance issue on fs like btrfs. 
> +         * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value  
> will 
> +         * be ignored since any failure of this operation should not block  
> the 
> +         * left work. 
> +         */ 
> +        int attr; 
> +        if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { 
> +            attr |= FS_NOCOW_FL; 
> +            ioctl(fd, FS_IOC_SETFLAGS, &attr); 
> +        } 
> +#endif 
> +    } 
> + 
>      /* 
>       * Calculate matching total_size and geometry. Increase the number of 
>       * sectors requested until we get enough (or fail). This ensures that 
> @@ -884,6 +908,11 @@ static QemuOptsList vpc_create_opts = { 
>                  "Type of virtual hard disk format. Supported formats are " 
>                  "{dynamic (default) | fixed} " 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/include/block/block_int.h b/include/block/block_int.h 
> index 7aa2213..4e5022a 100644 
> --- a/include/block/block_int.h 
> +++ b/include/block/block_int.h 
> @@ -54,6 +54,7 @@ 
>  #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts" 
>  #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type" 
>  #define BLOCK_OPT_REDUNDANCY        "redundancy" 
> +#define BLOCK_OPT_NOCOW             "nocow" 
>   
>  typedef struct BdrvTrackedRequest { 
>      BlockDriverState *bs; 
> diff --git a/qemu-doc.texi b/qemu-doc.texi 
> index 88ec9bb..ad92c85 100644 
> --- a/qemu-doc.texi 
> +++ b/qemu-doc.texi 
> @@ -589,6 +589,22 @@ check -r all} is required, which may take some time. 
>   
>  This option can only be enabled if @code{compat=1.1} is specified. 
>   
> address@hidden nocow 
> +If this option is set to @code{on}, it will trun off COW of the file. It's  
> only 
> +valid on btrfs, no effect on other file systems. 
> + 
> +Btrfs has low performance when hosting a VM image file, even more when the  
> guest 
> +on the VM also using btrfs as file system. Turning off COW is a way to  
> mitigate 
> +this bad performance. Generally there are two ways to turn off COW on  
> btrfs: 
> +a) Disable it by mounting with nodatacow, then all newly created files will  
> be 
> +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this  
> option 
> +does. 
> + 
> +Note: this option is only valid to new or empty files. If there is an  
> existing 
> +file which is COW and has data blocks already, it couldn't be changed to  
> NOCOW 
> +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check  
> if 
> +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). 
> + 
>  @end table 
>   
>  @item qed 
> diff --git a/qemu-img.texi b/qemu-img.texi 
> index c68b541..8496f3b 100644 
> --- a/qemu-img.texi 
> +++ b/qemu-img.texi 
> @@ -474,6 +474,22 @@ check -r all} is required, which may take some time. 
>   
>  This option can only be enabled if @code{compat=1.1} is specified. 
>   
> address@hidden nocow 
> +If this option is set to @code{on}, it will trun off COW of the file. It's  
> only 
> +valid on btrfs, no effect on other file systems. 
> + 
> +Btrfs has low performance when hosting a VM image file, even more when the  
> guest 
> +on the VM also using btrfs as file system. Turning off COW is a way to  
> mitigate 
> +this bad performance. Generally there are two ways to turn off COW on  
> btrfs: 
> +a) Disable it by mounting with nodatacow, then all newly created files will  
> be 
> +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this  
> option 
> +does. 
> + 
> +Note: this option is only valid to new or empty files. If there is an  
> existing 
> +file which is COW and has data blocks already, it couldn't be changed to  
> NOCOW 
> +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check  
> if 
> +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). 
> + 
>  @end table 
>   
>  @item Other 
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]