qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 15/18] block/mirror: Add active mirroring


From: Max Reitz
Subject: Re: [Qemu-block] [PATCH 15/18] block/mirror: Add active mirroring
Date: Wed, 11 Oct 2017 14:33:45 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

On 2017-10-10 12:16, Kevin Wolf wrote:
> Am 18.09.2017 um 18:26 hat Max Reitz geschrieben:
>> On 2017-09-18 12:06, Stefan Hajnoczi wrote:
>>> On Sat, Sep 16, 2017 at 03:58:01PM +0200, Max Reitz wrote:
>>>> On 2017-09-14 17:57, Stefan Hajnoczi wrote:
>>>>> On Wed, Sep 13, 2017 at 08:19:07PM +0200, Max Reitz wrote:
>>>>>> This patch implements active synchronous mirroring.  In active mode, the
>>>>>> passive mechanism will still be in place and is used to copy all
>>>>>> initially dirty clusters off the source disk; but every write request
>>>>>> will write data both to the source and the target disk, so the source
>>>>>> cannot be dirtied faster than data is mirrored to the target.  Also,
>>>>>> once the block job has converged (BLOCK_JOB_READY sent), source and
>>>>>> target are guaranteed to stay in sync (unless an error occurs).
>>>>>>
>>>>>> Optionally, dirty data can be copied to the target disk on read
>>>>>> operations, too.
>>>>>>
>>>>>> Active mode is completely optional and currently disabled at runtime.  A
>>>>>> later patch will add a way for users to enable it.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <address@hidden>
>>>>>> ---
>>>>>>  qapi/block-core.json |  23 +++++++
>>>>>>  block/mirror.c       | 187 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>>  2 files changed, 205 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>>> index bb11815608..e072cfa67c 100644
>>>>>> --- a/qapi/block-core.json
>>>>>> +++ b/qapi/block-core.json
>>>>>> @@ -938,6 +938,29 @@
>>>>>>    'data': ['top', 'full', 'none', 'incremental'] }
>>>>>>  
>>>>>>  ##
>>>>>> +# @MirrorCopyMode:
>>>>>> +#
>>>>>> +# An enumeration whose values tell the mirror block job when to
>>>>>> +# trigger writes to the target.
>>>>>> +#
>>>>>> +# @passive: copy data in background only.
>>>>>> +#
>>>>>> +# @active-write: when data is written to the source, write it
>>>>>> +#                (synchronously) to the target as well.  In addition,
>>>>>> +#                data is copied in background just like in @passive
>>>>>> +#                mode.
>>>>>> +#
>>>>>> +# @active-read-write: write data to the target (synchronously) both
>>>>>> +#                     when it is read from and written to the source.
>>>>>> +#                     In addition, data is copied in background just
>>>>>> +#                     like in @passive mode.
>>>>>
>>>>> I'm not sure the terms "active"/"passive" are helpful.  "Active commit"
>>>>> means committing the top-most BDS while the guest is accessing it.  The
>>>>> "passive" mirror block still works on the top-most BDS while the guest
>>>>> is accessing it.
>>>>>
>>>>> Calling it "asynchronous" and "synchronous" is clearer to me.  It's also
>>>>> the terminology used in disk replication (e.g. DRBD).
>>>>
>>>> I'd be OK with that, too, but I think I remember that in the past at
>>>> least Kevin made a clear distinction between active/passive and
>>>> sync/async when it comes to mirroring.
>>>>
>>>>> Ideally the user wouldn't have to worry about async vs sync because QEMU
>>>>> would switch modes as appropriate in order to converge.  That way
>>>>> libvirt also doesn't have to worry about this.
>>>>
>>>> So here you mean async/sync in the way I meant it, i.e., whether the
>>>> mirror operations themselves are async/sync?
>>>
>>> The meaning I had in mind is:
>>>
>>> Sync mirroring means a guest write waits until the target write
>>> completes.
>>
>> I.e. active-sync, ...
>>
>>> Async mirroring means guest writes completes independently of target
>>> writes.
>>
>> ... i.e. passive or active-async in the future.
> 
> So we already have at least three different modes, sync/async doesn't
> quite cut it anyway. There's a reason why we have been talking about
> both active/passive and sync/async.
> 
> When I was looking at the code, it actually occurred to me that there
> are more possible different modes than I thought there were: This patch
> waits for successful completion on the source before it even attempts to
> write to the destination.
> 
> Wouldn't it be generally (i.e. in the success case) more useful if we
> start both requests at the same time and only wait for both to complete,
> avoiding to double the latency? If the source write fails, we're out of
> sync, obviously, so we'd have to mark the block dirty again.

I've thought about it, but my issues were:

(1) What to do when something fails
and
(2) I didn't really want to start coroutines from coroutines...

As for (1)...  My notes actually say I've come to a conclusion: If the
target write fails, that's pretty much OK, because then the source is
newer than the target, which is normal for mirroring.  If the source
write fails, we can just consider the target outdated, too (as you've
said).  Also, we'll give an error to the guest, so it's clear that
something has gone wrong.

So (2) was the reason I didn't do it in this series.  I think it's OK to
add this later on and let future me worry about how to coordinate both
requests.

I guess I'd start e.g. the target operation as a new coroutine, then
continue the source operation in the original one, and finally yield
until the target operation has finished?

> By the way, what happens when the guest modifies the RAM during the
> request? Is it acceptable even for writes if source and target differ
> after a successful write operation? Don't we need a bounce buffer
> anyway?

Sometimes I think that maybe I shouldn't keep my thoughts to myself
after I've come to the conclusion "...naah, it's all bad anyway". :-)

When Stefan mentioned this for reads, I thought about the write
situation, yes.  My conclusion was that the guest would be required (by
protocol) to keep the write buffer constant while the operation is
running, because otherwise the guest has no idea what is going to be on
disk.  So it would be stupid for the guest to modify the write buffer then.

But (1) depending on the emulated hardware, maybe the guest does have an
idea (e.g. some register that tells the guest which offset is currently
written) -- but with the structure of the block layer, I doubt that's
possible in qemu,

and (2) maybe the guest wants to be stupid.  Even if the guest doesn't
know what will end up on disk, we have to make sure that it's the same
on both source and target.

So, yeah, a bounce buffer would be good in all cases.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]