qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/21] new backup architecture


From: John Snow
Subject: Re: [Qemu-devel] [PATCH 00/21] new backup architecture
Date: Tue, 10 Jan 2017 13:48:51 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0


On 01/10/2017 01:05 AM, Jeff Cody wrote:
> On Mon, Jan 09, 2017 at 11:04:27AM +0000, Stefan Hajnoczi wrote:
>> On Fri, Dec 23, 2016 at 05:28:43PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>
>> Jeff or John: are you reviewing this?
> 
> It's in my review queue, but it would probably be a good one for John to
> review as well if he has time.
> 

Sorry, just back from vacation. It may take me a bit to get to it.

>>
>>> This is a new architecture for backup. It solves some current problems:
>>> 1. intersecting requests: for now at request start we wait for all 
>>> intersecting requests, which means that
>>>     a. we may wait even for unrelated to our request clusters
>>>     b. not full async: if we are going to copy clusters 1,2,3,4, when 2 and 
>>> 4 are in flight, why should we wait for 2 and 4 to be fully copied? Why not 
>>> to start 1 and 3 in parallel with 2 and 4?
>>>
>>> 2. notifier request is internally synchronous: if notifier starts copying 
>>> clusters 1,2,3,4, they will be copied one by one in synchronous loop.
>>>
>>> 3. notifier wait full copying of corresponding clusters (when actually it 
>>> may wait only for _read_ operations to be finished)
>>>
>>> In short, what is done:
>>> 1. full async scheme
>>> 4. no intersecting requests
>>> 3. notifiers can wait only for read, not for write
>>> 4. notifiers wait only for corresponding clusters
>>> 5. time limit for notifiers
>>> 5. skip unallocated clusters for full mode
>>> 6. use HBitmap as main backup bitmap and just init it from dirty bitmap for 
>>> incremental case
>>> 7. retrying: do not reread on write fail
>>>
>>> # Intro
>>>
>>> Instead of sync-copying + async-notifiers as in old backup, or aio requests 
>>> like in mirror, this scheme just start 24 workers - separate coroutines, 
>>> each of them copying clusters synchronously. Copying is only done by one 
>>> cluster, there are no large requests.
>>> The only difference for clusters, awaited by write notifiers, is larger 
>>> priority. So, notifiers do not start io requests, they just mark some 
>>> clusters as awaited and yield. Then, when some worker completes read of 
>>> last cluster, awaited by this notifier it will enter it.
>>>
>>> # Some data structures
>>>
>>> Instead of done_bitmap - copy_bitmap, like in mirror.
>>> HBitmap copy_bitmap
>>>     Exactly, what should be copied:
>>>     0 - may mean one of three things:
>>>         - this cluster should not be copied at all
>>>         - this cluster is in flight
>>>         - this cluster is already copied
>>>     1 - means that cluster should be copied, but not touched yet (no async 
>>> io exists for it)
>>>
>>> New bitmap: notif_wait_bitmap - not HBitmap, just Bitmap.
>>>     Exactly, in flight clusters, waiting for read operation:
>>>     0 - may mean one of three things:
>>>         - this cluster should not be copied at all
>>>         - this cluster is in flight and it is _already_ read to memory
>>>         - this cluster is already copied
>>>     1 - means that cluster is in flight, but read operation have not 
>>> finished
>>>         yet
>>>     The only exception is none-mode: in this case 1 means in flight: in io 
>>> read or write. This is needed for image fleecing.
>>>
>>> Cluster states (copy_bitmap, notif_wait_bitmap)
>>>
>>> 0, 0 - Ignored (should not be copied at all) or In flight (read done) or 
>>> Copied
>>> 0, 1 - In flight, read operation not finished (or write op. - for none-mode)
>>> 1, 0 - Should be copied, but not touched yet
>>> 1, 1 - Impossible state
>>>
>>> NotifierRequest - request from notifier, it changes sequence of cluster 
>>> copying by workers.
>>> NotifierRequest {
>>>     int64_t start;
>>>     int64_t end;
>>>     int nb_wait; // nb clusters (in specified range) that should be copied 
>>> but not already read, i.e. clusters awaited by this notifier
>>>     Coroutine *notif; // corresponding notifier coroutine
>>> }
>>>
>>> notifier_reqs - list of notifier requests
>>>
>>> # More info
>>>
>>> At backup start copy_bitmap is inited to sync_bitmap for incremental 
>>> backup. For top/full backup it is inited to all ones, but in parallel with 
>>> workers main coroutine skips not allocated clusters.
>>>
>>> Worker coroutines are copying clusters, preferable awaited by notifiers 
>>> (for which NotifierRequest exists in the list). Function get_work helps 
>>> them.
>>> Workers will copy clusters, awaited by notifiers even if block-job is 
>>> paused - it is the same behaviour  as in old architecture.
>>>
>>> Old backup fails guest-write if notifier fails to backup corresponding 
>>> clusters. In the new scheme there is a little difference: notifier just 
>>> wait for 5s and if backup can't copy all corresponding clusters in this 
>>> time - guest-write fails.
>>> Error scenarios was considered on list, the final solution was to provide 
>>> user a possibility to chose what should be failed: backup or guest-write. 
>>> I'll add this later.
>>>
>>> Worker can exit (no more clusters to copy or fatal error) or pause (error 
>>> or user pause or throttling). When last worker goes to pause it rings up 
>>> main block-job coroutine, which will handle user pause or errors. We need 
>>> to handle errors in main coroutine because of nature of 
>>> block_job_error_action, which may yield.
>>>
>>> There also is a bonus: new io-retrying scheme: if there is an error on read 
>>> or write, worker just yield in the retrying loop and if it will be resumed 
>>> (with job->error_exit = false) it will continue from the same place, so if 
>>> we have failed write after successful read we will not reread.
>>>
>>> Vladimir Sementsov-Ogievskiy (21):
>>>   backup: move from done_bitmap to copy_bitmap
>>>   backup: init copy_bitmap from sync_bitmap for incremental
>>>   backup: improve non-dirty bits progress processing
>>>   backup: use copy_bitmap in incremental backup
>>>   hbitmap: improve dirty iter
>>>   backup: rewrite top mode cluster skipping
>>>   backup: refactor: merge top/full/incremental backup code
>>>   backup: skip unallocated clusters for full mode
>>>   backup: separate copy function
>>>   backup: refactor backup_copy_cluster()
>>>   backup: move r/w error handling code to r/w functions
>>>   iotests: add supported_cache_modes to main function
>>>   coroutine: add qemu_coroutine_add_next
>>>   block: add trace point on bdrv_close_all
>>>   bitmap: add bitmap_count_between() function
>>>   hbitmap: add hbitmap_count_between() function
>>>   backup: make all reads not serializing
>>>   backup: new async architecture
>>>   backup: refactor backup_do_cow
>>>   backup: move bitmap handling from backup_do_cow to get_work
>>>   backup: refactor: remove backup_do_cow()
>>>
>>>  block.c                       |   1 +
>>>  block/backup.c                | 871 
>>> +++++++++++++++++++++++++++++++-----------
>>>  block/trace-events            |  34 +-
>>>  blockjob.c                    |  29 +-
>>>  include/block/blockjob.h      |  15 +-
>>>  include/qemu/bitmap.h         |   4 +
>>>  include/qemu/coroutine.h      |   2 +
>>>  include/qemu/hbitmap.h        |  26 +-
>>>  tests/qemu-iotests/055        |   4 +-
>>>  tests/qemu-iotests/129        |   6 +-
>>>  tests/qemu-iotests/iotests.py |   7 +-
>>>  util/bitmap.c                 |  27 ++
>>>  util/hbitmap.c                |  32 +-
>>>  util/qemu-coroutine.c         |   7 +
>>>  14 files changed, 805 insertions(+), 260 deletions(-)
>>>
>>> -- 
>>> 1.8.3.1
>>>
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]