[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH 00/21] new backup architecture
From: |
Jeff Cody |
Subject: |
Re: [Qemu-block] [PATCH 00/21] new backup architecture |
Date: |
Tue, 10 Jan 2017 01:05:21 -0500 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Mon, Jan 09, 2017 at 11:04:27AM +0000, Stefan Hajnoczi wrote:
> On Fri, Dec 23, 2016 at 05:28:43PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>
> Jeff or John: are you reviewing this?
It's in my review queue, but it would probably be a good one for John to
review as well if he has time.
>
> > This is a new architecture for backup. It solves some current problems:
> > 1. intersecting requests: for now at request start we wait for all
> > intersecting requests, which means that
> > a. we may wait even for unrelated to our request clusters
> > b. not full async: if we are going to copy clusters 1,2,3,4, when 2 and
> > 4 are in flight, why should we wait for 2 and 4 to be fully copied? Why not
> > to start 1 and 3 in parallel with 2 and 4?
> >
> > 2. notifier request is internally synchronous: if notifier starts copying
> > clusters 1,2,3,4, they will be copied one by one in synchronous loop.
> >
> > 3. notifier wait full copying of corresponding clusters (when actually it
> > may wait only for _read_ operations to be finished)
> >
> > In short, what is done:
> > 1. full async scheme
> > 4. no intersecting requests
> > 3. notifiers can wait only for read, not for write
> > 4. notifiers wait only for corresponding clusters
> > 5. time limit for notifiers
> > 5. skip unallocated clusters for full mode
> > 6. use HBitmap as main backup bitmap and just init it from dirty bitmap for
> > incremental case
> > 7. retrying: do not reread on write fail
> >
> > # Intro
> >
> > Instead of sync-copying + async-notifiers as in old backup, or aio requests
> > like in mirror, this scheme just start 24 workers - separate coroutines,
> > each of them copying clusters synchronously. Copying is only done by one
> > cluster, there are no large requests.
> > The only difference for clusters, awaited by write notifiers, is larger
> > priority. So, notifiers do not start io requests, they just mark some
> > clusters as awaited and yield. Then, when some worker completes read of
> > last cluster, awaited by this notifier it will enter it.
> >
> > # Some data structures
> >
> > Instead of done_bitmap - copy_bitmap, like in mirror.
> > HBitmap copy_bitmap
> > Exactly, what should be copied:
> > 0 - may mean one of three things:
> > - this cluster should not be copied at all
> > - this cluster is in flight
> > - this cluster is already copied
> > 1 - means that cluster should be copied, but not touched yet (no async
> > io exists for it)
> >
> > New bitmap: notif_wait_bitmap - not HBitmap, just Bitmap.
> > Exactly, in flight clusters, waiting for read operation:
> > 0 - may mean one of three things:
> > - this cluster should not be copied at all
> > - this cluster is in flight and it is _already_ read to memory
> > - this cluster is already copied
> > 1 - means that cluster is in flight, but read operation have not
> > finished
> > yet
> > The only exception is none-mode: in this case 1 means in flight: in io
> > read or write. This is needed for image fleecing.
> >
> > Cluster states (copy_bitmap, notif_wait_bitmap)
> >
> > 0, 0 - Ignored (should not be copied at all) or In flight (read done) or
> > Copied
> > 0, 1 - In flight, read operation not finished (or write op. - for none-mode)
> > 1, 0 - Should be copied, but not touched yet
> > 1, 1 - Impossible state
> >
> > NotifierRequest - request from notifier, it changes sequence of cluster
> > copying by workers.
> > NotifierRequest {
> > int64_t start;
> > int64_t end;
> > int nb_wait; // nb clusters (in specified range) that should be copied
> > but not already read, i.e. clusters awaited by this notifier
> > Coroutine *notif; // corresponding notifier coroutine
> > }
> >
> > notifier_reqs - list of notifier requests
> >
> > # More info
> >
> > At backup start copy_bitmap is inited to sync_bitmap for incremental
> > backup. For top/full backup it is inited to all ones, but in parallel with
> > workers main coroutine skips not allocated clusters.
> >
> > Worker coroutines are copying clusters, preferable awaited by notifiers
> > (for which NotifierRequest exists in the list). Function get_work helps
> > them.
> > Workers will copy clusters, awaited by notifiers even if block-job is
> > paused - it is the same behaviour as in old architecture.
> >
> > Old backup fails guest-write if notifier fails to backup corresponding
> > clusters. In the new scheme there is a little difference: notifier just
> > wait for 5s and if backup can't copy all corresponding clusters in this
> > time - guest-write fails.
> > Error scenarios was considered on list, the final solution was to provide
> > user a possibility to chose what should be failed: backup or guest-write.
> > I'll add this later.
> >
> > Worker can exit (no more clusters to copy or fatal error) or pause (error
> > or user pause or throttling). When last worker goes to pause it rings up
> > main block-job coroutine, which will handle user pause or errors. We need
> > to handle errors in main coroutine because of nature of
> > block_job_error_action, which may yield.
> >
> > There also is a bonus: new io-retrying scheme: if there is an error on read
> > or write, worker just yield in the retrying loop and if it will be resumed
> > (with job->error_exit = false) it will continue from the same place, so if
> > we have failed write after successful read we will not reread.
> >
> > Vladimir Sementsov-Ogievskiy (21):
> > backup: move from done_bitmap to copy_bitmap
> > backup: init copy_bitmap from sync_bitmap for incremental
> > backup: improve non-dirty bits progress processing
> > backup: use copy_bitmap in incremental backup
> > hbitmap: improve dirty iter
> > backup: rewrite top mode cluster skipping
> > backup: refactor: merge top/full/incremental backup code
> > backup: skip unallocated clusters for full mode
> > backup: separate copy function
> > backup: refactor backup_copy_cluster()
> > backup: move r/w error handling code to r/w functions
> > iotests: add supported_cache_modes to main function
> > coroutine: add qemu_coroutine_add_next
> > block: add trace point on bdrv_close_all
> > bitmap: add bitmap_count_between() function
> > hbitmap: add hbitmap_count_between() function
> > backup: make all reads not serializing
> > backup: new async architecture
> > backup: refactor backup_do_cow
> > backup: move bitmap handling from backup_do_cow to get_work
> > backup: refactor: remove backup_do_cow()
> >
> > block.c | 1 +
> > block/backup.c | 871
> > +++++++++++++++++++++++++++++++-----------
> > block/trace-events | 34 +-
> > blockjob.c | 29 +-
> > include/block/blockjob.h | 15 +-
> > include/qemu/bitmap.h | 4 +
> > include/qemu/coroutine.h | 2 +
> > include/qemu/hbitmap.h | 26 +-
> > tests/qemu-iotests/055 | 4 +-
> > tests/qemu-iotests/129 | 6 +-
> > tests/qemu-iotests/iotests.py | 7 +-
> > util/bitmap.c | 27 ++
> > util/hbitmap.c | 32 +-
> > util/qemu-coroutine.c | 7 +
> > 14 files changed, 805 insertions(+), 260 deletions(-)
> >
> > --
> > 1.8.3.1
> >