[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH 00/21] new backup architecture
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-block] [PATCH 00/21] new backup architecture |
Date: |
Mon, 9 Jan 2017 11:04:27 +0000 |
User-agent: |
Mutt/1.7.1 (2016-10-04) |
On Fri, Dec 23, 2016 at 05:28:43PM +0300, Vladimir Sementsov-Ogievskiy wrote:
Jeff or John: are you reviewing this?
> This is a new architecture for backup. It solves some current problems:
> 1. intersecting requests: for now at request start we wait for all
> intersecting requests, which means that
> a. we may wait even for unrelated to our request clusters
> b. not full async: if we are going to copy clusters 1,2,3,4, when 2 and 4
> are in flight, why should we wait for 2 and 4 to be fully copied? Why not to
> start 1 and 3 in parallel with 2 and 4?
>
> 2. notifier request is internally synchronous: if notifier starts copying
> clusters 1,2,3,4, they will be copied one by one in synchronous loop.
>
> 3. notifier wait full copying of corresponding clusters (when actually it may
> wait only for _read_ operations to be finished)
>
> In short, what is done:
> 1. full async scheme
> 4. no intersecting requests
> 3. notifiers can wait only for read, not for write
> 4. notifiers wait only for corresponding clusters
> 5. time limit for notifiers
> 5. skip unallocated clusters for full mode
> 6. use HBitmap as main backup bitmap and just init it from dirty bitmap for
> incremental case
> 7. retrying: do not reread on write fail
>
> # Intro
>
> Instead of sync-copying + async-notifiers as in old backup, or aio requests
> like in mirror, this scheme just start 24 workers - separate coroutines, each
> of them copying clusters synchronously. Copying is only done by one cluster,
> there are no large requests.
> The only difference for clusters, awaited by write notifiers, is larger
> priority. So, notifiers do not start io requests, they just mark some
> clusters as awaited and yield. Then, when some worker completes read of last
> cluster, awaited by this notifier it will enter it.
>
> # Some data structures
>
> Instead of done_bitmap - copy_bitmap, like in mirror.
> HBitmap copy_bitmap
> Exactly, what should be copied:
> 0 - may mean one of three things:
> - this cluster should not be copied at all
> - this cluster is in flight
> - this cluster is already copied
> 1 - means that cluster should be copied, but not touched yet (no async io
> exists for it)
>
> New bitmap: notif_wait_bitmap - not HBitmap, just Bitmap.
> Exactly, in flight clusters, waiting for read operation:
> 0 - may mean one of three things:
> - this cluster should not be copied at all
> - this cluster is in flight and it is _already_ read to memory
> - this cluster is already copied
> 1 - means that cluster is in flight, but read operation have not finished
> yet
> The only exception is none-mode: in this case 1 means in flight: in io
> read or write. This is needed for image fleecing.
>
> Cluster states (copy_bitmap, notif_wait_bitmap)
>
> 0, 0 - Ignored (should not be copied at all) or In flight (read done) or
> Copied
> 0, 1 - In flight, read operation not finished (or write op. - for none-mode)
> 1, 0 - Should be copied, but not touched yet
> 1, 1 - Impossible state
>
> NotifierRequest - request from notifier, it changes sequence of cluster
> copying by workers.
> NotifierRequest {
> int64_t start;
> int64_t end;
> int nb_wait; // nb clusters (in specified range) that should be copied
> but not already read, i.e. clusters awaited by this notifier
> Coroutine *notif; // corresponding notifier coroutine
> }
>
> notifier_reqs - list of notifier requests
>
> # More info
>
> At backup start copy_bitmap is inited to sync_bitmap for incremental backup.
> For top/full backup it is inited to all ones, but in parallel with workers
> main coroutine skips not allocated clusters.
>
> Worker coroutines are copying clusters, preferable awaited by notifiers (for
> which NotifierRequest exists in the list). Function get_work helps them.
> Workers will copy clusters, awaited by notifiers even if block-job is paused
> - it is the same behaviour as in old architecture.
>
> Old backup fails guest-write if notifier fails to backup corresponding
> clusters. In the new scheme there is a little difference: notifier just wait
> for 5s and if backup can't copy all corresponding clusters in this time -
> guest-write fails.
> Error scenarios was considered on list, the final solution was to provide
> user a possibility to chose what should be failed: backup or guest-write.
> I'll add this later.
>
> Worker can exit (no more clusters to copy or fatal error) or pause (error or
> user pause or throttling). When last worker goes to pause it rings up main
> block-job coroutine, which will handle user pause or errors. We need to
> handle errors in main coroutine because of nature of block_job_error_action,
> which may yield.
>
> There also is a bonus: new io-retrying scheme: if there is an error on read
> or write, worker just yield in the retrying loop and if it will be resumed
> (with job->error_exit = false) it will continue from the same place, so if we
> have failed write after successful read we will not reread.
>
> Vladimir Sementsov-Ogievskiy (21):
> backup: move from done_bitmap to copy_bitmap
> backup: init copy_bitmap from sync_bitmap for incremental
> backup: improve non-dirty bits progress processing
> backup: use copy_bitmap in incremental backup
> hbitmap: improve dirty iter
> backup: rewrite top mode cluster skipping
> backup: refactor: merge top/full/incremental backup code
> backup: skip unallocated clusters for full mode
> backup: separate copy function
> backup: refactor backup_copy_cluster()
> backup: move r/w error handling code to r/w functions
> iotests: add supported_cache_modes to main function
> coroutine: add qemu_coroutine_add_next
> block: add trace point on bdrv_close_all
> bitmap: add bitmap_count_between() function
> hbitmap: add hbitmap_count_between() function
> backup: make all reads not serializing
> backup: new async architecture
> backup: refactor backup_do_cow
> backup: move bitmap handling from backup_do_cow to get_work
> backup: refactor: remove backup_do_cow()
>
> block.c | 1 +
> block/backup.c | 871
> +++++++++++++++++++++++++++++++-----------
> block/trace-events | 34 +-
> blockjob.c | 29 +-
> include/block/blockjob.h | 15 +-
> include/qemu/bitmap.h | 4 +
> include/qemu/coroutine.h | 2 +
> include/qemu/hbitmap.h | 26 +-
> tests/qemu-iotests/055 | 4 +-
> tests/qemu-iotests/129 | 6 +-
> tests/qemu-iotests/iotests.py | 7 +-
> util/bitmap.c | 27 ++
> util/hbitmap.c | 32 +-
> util/qemu-coroutine.c | 7 +
> 14 files changed, 805 insertions(+), 260 deletions(-)
>
> --
> 1.8.3.1
>
signature.asc
Description: PGP signature
- Re: [Qemu-block] [PATCH 00/21] new backup architecture,
Stefan Hajnoczi <=