qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 00/21] new backup architecture


From: Stefan Hajnoczi
Subject: Re: [Qemu-block] [PATCH 00/21] new backup architecture
Date: Mon, 9 Jan 2017 11:04:27 +0000
User-agent: Mutt/1.7.1 (2016-10-04)

On Fri, Dec 23, 2016 at 05:28:43PM +0300, Vladimir Sementsov-Ogievskiy wrote:

Jeff or John: are you reviewing this?

> This is a new architecture for backup. It solves some current problems:
> 1. intersecting requests: for now at request start we wait for all 
> intersecting requests, which means that
>     a. we may wait even for unrelated to our request clusters
>     b. not full async: if we are going to copy clusters 1,2,3,4, when 2 and 4 
> are in flight, why should we wait for 2 and 4 to be fully copied? Why not to 
> start 1 and 3 in parallel with 2 and 4?
> 
> 2. notifier request is internally synchronous: if notifier starts copying 
> clusters 1,2,3,4, they will be copied one by one in synchronous loop.
> 
> 3. notifier wait full copying of corresponding clusters (when actually it may 
> wait only for _read_ operations to be finished)
> 
> In short, what is done:
> 1. full async scheme
> 4. no intersecting requests
> 3. notifiers can wait only for read, not for write
> 4. notifiers wait only for corresponding clusters
> 5. time limit for notifiers
> 5. skip unallocated clusters for full mode
> 6. use HBitmap as main backup bitmap and just init it from dirty bitmap for 
> incremental case
> 7. retrying: do not reread on write fail
> 
> # Intro
> 
> Instead of sync-copying + async-notifiers as in old backup, or aio requests 
> like in mirror, this scheme just start 24 workers - separate coroutines, each 
> of them copying clusters synchronously. Copying is only done by one cluster, 
> there are no large requests.
> The only difference for clusters, awaited by write notifiers, is larger 
> priority. So, notifiers do not start io requests, they just mark some 
> clusters as awaited and yield. Then, when some worker completes read of last 
> cluster, awaited by this notifier it will enter it.
> 
> # Some data structures
> 
> Instead of done_bitmap - copy_bitmap, like in mirror.
> HBitmap copy_bitmap
>     Exactly, what should be copied:
>     0 - may mean one of three things:
>         - this cluster should not be copied at all
>         - this cluster is in flight
>         - this cluster is already copied
>     1 - means that cluster should be copied, but not touched yet (no async io 
> exists for it)
> 
> New bitmap: notif_wait_bitmap - not HBitmap, just Bitmap.
>     Exactly, in flight clusters, waiting for read operation:
>     0 - may mean one of three things:
>         - this cluster should not be copied at all
>         - this cluster is in flight and it is _already_ read to memory
>         - this cluster is already copied
>     1 - means that cluster is in flight, but read operation have not finished
>         yet
>     The only exception is none-mode: in this case 1 means in flight: in io 
> read or write. This is needed for image fleecing.
> 
> Cluster states (copy_bitmap, notif_wait_bitmap)
> 
> 0, 0 - Ignored (should not be copied at all) or In flight (read done) or 
> Copied
> 0, 1 - In flight, read operation not finished (or write op. - for none-mode)
> 1, 0 - Should be copied, but not touched yet
> 1, 1 - Impossible state
> 
> NotifierRequest - request from notifier, it changes sequence of cluster 
> copying by workers.
> NotifierRequest {
>     int64_t start;
>     int64_t end;
>     int nb_wait; // nb clusters (in specified range) that should be copied 
> but not already read, i.e. clusters awaited by this notifier
>     Coroutine *notif; // corresponding notifier coroutine
> }
> 
> notifier_reqs - list of notifier requests
> 
> # More info
> 
> At backup start copy_bitmap is inited to sync_bitmap for incremental backup. 
> For top/full backup it is inited to all ones, but in parallel with workers 
> main coroutine skips not allocated clusters.
> 
> Worker coroutines are copying clusters, preferable awaited by notifiers (for 
> which NotifierRequest exists in the list). Function get_work helps them.
> Workers will copy clusters, awaited by notifiers even if block-job is paused 
> - it is the same behaviour  as in old architecture.
> 
> Old backup fails guest-write if notifier fails to backup corresponding 
> clusters. In the new scheme there is a little difference: notifier just wait 
> for 5s and if backup can't copy all corresponding clusters in this time - 
> guest-write fails.
> Error scenarios was considered on list, the final solution was to provide 
> user a possibility to chose what should be failed: backup or guest-write. 
> I'll add this later.
> 
> Worker can exit (no more clusters to copy or fatal error) or pause (error or 
> user pause or throttling). When last worker goes to pause it rings up main 
> block-job coroutine, which will handle user pause or errors. We need to 
> handle errors in main coroutine because of nature of block_job_error_action, 
> which may yield.
> 
> There also is a bonus: new io-retrying scheme: if there is an error on read 
> or write, worker just yield in the retrying loop and if it will be resumed 
> (with job->error_exit = false) it will continue from the same place, so if we 
> have failed write after successful read we will not reread.
> 
> Vladimir Sementsov-Ogievskiy (21):
>   backup: move from done_bitmap to copy_bitmap
>   backup: init copy_bitmap from sync_bitmap for incremental
>   backup: improve non-dirty bits progress processing
>   backup: use copy_bitmap in incremental backup
>   hbitmap: improve dirty iter
>   backup: rewrite top mode cluster skipping
>   backup: refactor: merge top/full/incremental backup code
>   backup: skip unallocated clusters for full mode
>   backup: separate copy function
>   backup: refactor backup_copy_cluster()
>   backup: move r/w error handling code to r/w functions
>   iotests: add supported_cache_modes to main function
>   coroutine: add qemu_coroutine_add_next
>   block: add trace point on bdrv_close_all
>   bitmap: add bitmap_count_between() function
>   hbitmap: add hbitmap_count_between() function
>   backup: make all reads not serializing
>   backup: new async architecture
>   backup: refactor backup_do_cow
>   backup: move bitmap handling from backup_do_cow to get_work
>   backup: refactor: remove backup_do_cow()
> 
>  block.c                       |   1 +
>  block/backup.c                | 871 
> +++++++++++++++++++++++++++++++-----------
>  block/trace-events            |  34 +-
>  blockjob.c                    |  29 +-
>  include/block/blockjob.h      |  15 +-
>  include/qemu/bitmap.h         |   4 +
>  include/qemu/coroutine.h      |   2 +
>  include/qemu/hbitmap.h        |  26 +-
>  tests/qemu-iotests/055        |   4 +-
>  tests/qemu-iotests/129        |   6 +-
>  tests/qemu-iotests/iotests.py |   7 +-
>  util/bitmap.c                 |  27 ++
>  util/hbitmap.c                |  32 +-
>  util/qemu-coroutine.c         |   7 +
>  14 files changed, 805 insertions(+), 260 deletions(-)
> 
> -- 
> 1.8.3.1
> 

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]