[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] migration/block: use blk_pwrite_zeroes for e
From: |
Fam Zheng |
Subject: |
Re: [Qemu-devel] [PATCH v2] migration/block: use blk_pwrite_zeroes for each zero cluster |
Date: |
Mon, 10 Apr 2017 09:47:36 +0800 |
User-agent: |
Mutt/1.8.0 (2017-02-23) |
On Sat, 04/08 21:29, 858585 jemmy wrote:
> On Sat, Apr 8, 2017 at 12:52 PM, 858585 jemmy <address@hidden> wrote:
> > On Fri, Apr 7, 2017 at 6:10 PM, Fam Zheng <address@hidden> wrote:
> >> On Fri, 04/07 16:44, address@hidden wrote:
> >>> From: Lidong Chen <address@hidden>
> >>>
> >>> BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
> >>> this maybe cause the qcow2 file size is bigger after migration.
> >>> This patch check each cluster, use blk_pwrite_zeroes for each
> >>> zero cluster.
> >>>
> >>> Signed-off-by: Lidong Chen <address@hidden>
> >>> ---
> >>> migration/block.c | 37 +++++++++++++++++++++++++++++++++++--
> >>> 1 file changed, 35 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/migration/block.c b/migration/block.c
> >>> index 7734ff7..c32e046 100644
> >>> --- a/migration/block.c
> >>> +++ b/migration/block.c
> >>> @@ -885,6 +885,11 @@ static int block_load(QEMUFile *f, void *opaque, int
> >>> version_id)
> >>> int64_t total_sectors = 0;
> >>> int nr_sectors;
> >>> int ret;
> >>> + int i;
> >>> + int64_t addr_offset;
> >>> + uint8_t *buf_offset;
> >>
> >> Poor variable names, they are not offset, maybe "cur_addr" and "cur_buf"?
> >> And
> >> they can be moved to the loop block below.
> > ok, i will change.
> >
> >>
> >>> + BlockDriverInfo bdi;
> >>> + int cluster_size;
> >>>
> >>> do {
> >>> addr = qemu_get_be64(f);
> >>> @@ -934,8 +939,36 @@ static int block_load(QEMUFile *f, void *opaque, int
> >>> version_id)
> >>> } else {
> >>> buf = g_malloc(BLOCK_SIZE);
> >>> qemu_get_buffer(f, buf, BLOCK_SIZE);
> >>> - ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
> >>> - nr_sectors * BDRV_SECTOR_SIZE, 0);
> >>> +
> >>> + ret = bdrv_get_info(blk_bs(blk), &bdi);
> >>> + cluster_size = bdi.cluster_size;
> >>> +
> >>> + if (ret == 0 && cluster_size > 0 &&
> >>> + cluster_size < BLOCK_SIZE &&
> >>
> >> I think cluster_size == BLOCK_SIZE should work too.
> > This case the (flags & BLK_MIG_FLAG_ZERO_BLOCK) should be true,
> > and will invoke blk_pwrite_zeroes before apply this patch.
> > but maybe the source qemu maybe not enabled zero flag.
> > so i think cluster_size <= BLOCK_SIZE is ok.
> >
> >>
> >>> + BLOCK_SIZE % cluster_size == 0) {
> >>> + for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
> >>> + addr_offset = addr * BDRV_SECTOR_SIZE
> >>> + + i * cluster_size;
> >>> + buf_offset = buf + i * cluster_size;
> >>> +
> >>> + if (buffer_is_zero(buf_offset, cluster_size)) {
> >>> + ret = blk_pwrite_zeroes(blk, addr_offset,
> >>> + cluster_size,
> >>> + BDRV_REQ_MAY_UNMAP);
> >>> + } else {
> >>> + ret = blk_pwrite(blk, addr_offset,
> >>> buf_offset,
> >>> + cluster_size, 0);
> >>> + }
> >>> +
> >>> + if (ret < 0) {
> >>> + g_free(buf);
> >>> + return ret;
> >>> + }
> >>> + }
> >>> + } else {
> >>> + ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
> >>> + nr_sectors * BDRV_SECTOR_SIZE, 0);
> >>> + }
> >>> g_free(buf);
> >>> }
> >>>
> >>> --
> >>> 1.8.3.1
> >>>
> >>
> >> Is it possible use (source) cluster size as the transfer chunk size,
> >> instead of
> >> BDRV_SECTORS_PER_DIRTY_CHUNK? Then the existing BLK_MIG_FLAG_ZERO_BLOCK
> >> logic
> >> can help and you don't need to send zero bytes on the wire. This may still
> >> not
> >> be optimal if dest has larger cluster, but it should cover the common use
> >> case
> >> well.
> >
> > yes, i also think BDRV_SECTORS_PER_DIRTY_CHUNK is too large.
> > This have two disadvantage:
> > 1. it will cause the dest qcow2 file size is bigger after migration.
> > 2. it will cause transfer not necessary data, and maybe cause the
> > migration can't be successful.
> >
> > in my production environment, some vm only write 2MB/s, the dirty
> > block migrate speed is 70MB/s.
> > but it still migration timeout.
> >
> > but if we change the size of BDRV_SECTORS_PER_DIRTY_CHUNK, it will
> > break the protocol.
> > the old version qemu will not be able to migrate to new version qemu.
> > there are not information about the length about the migration buffer.
> >
> > so i think we should add new flags to indicate that there are an
> > additional byte about the length
> > of migration buffer. i will send another patch later, and test the result.
>
> Hi Fam:
> Do we need consider the circumstances than migrate from new qemu version
> to old qemu version?
Yes, usually we use a subsection to achieve that - missing the "chunk size"
field should result in using the old BDRV_SECTORS_PER_DIRTY_CHUNK value.
Fam
>
> >
> > this patch is also valuable, there are many old version qemu in my
> > production environment.
> > and will be benefit with this patch.
> >
> >>
> >> Fam
>