qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support


From: Jeff Cody
Subject: Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
Date: Fri, 08 Jun 2012 13:46:27 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

On 06/08/2012 12:11 PM, Kevin Wolf wrote:
> Am 08.06.2012 16:32, schrieb Jeff Cody:
>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <address@hidden> wrote:
>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>> Note that block-commit cannot work on the top-most image since the
>>>>> guest is still writing to that image and we might never be able to
>>>>> copy all the data into the base image (the guest could write new data
>>>>> as quickly as we copy it to the base).  The command should check for
>>>>> this and reject the top-most image.
>>>>
>>>> By this you mean that you would like to disallow committing the
>>>> top-level image to the base?  Perhaps there is a way to attempt to
>>>> converge, and adaptively give more time to the co-routine if we are able
>>>> to detect divergence.  This may require violating the 'speed' parameter,
>>>> however, and make the commit less 'live'.
>>>
>>> Yes, I think we should disallow merging down the topmost image.  The
>>> convergence problem is the same issue that live migration has.  It's a
>>> hard problem and not something that is essential for block-commit.  It
>>> would add complexity into the block-commit implementation - the only
>>> benefit would be that you can merge down the last COW snapshot.  We
>>> already have an implementation that does this: the "commit" command
>>> which stops the guest :).
>>
>> OK.  And, it is better to get this implemented now, and if desired, I
>> suppose we can always revisit the convergence at a later date (or not at
>> all, if there is no pressing case for it).
> 
> Not allowing live commit for the topmost image may be a good way to
> break the problem down into more manageable pieces, but eventually the
> feature isn't complete until you can do it.
> 
> Basically what you need there is the equivalent of an active mirror. I
> wouldn't be surprised if actually code could be reused for both.

I agree, doing it like mirroring for new writes on the top layer makes
sense, as long as you are willing to violate the (optional) speed
parameter.  I wouldn't think violating the speed parameter in that case
would be an issue, as long as it is a documented affect of performing a
live commit on the active (top) layer.

> 
>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>
>>>>
>>>> I think we are almost completely on the same page - devil is in the
>>>> details, of course (for instance, on how to convert the destination base
>>>> from r/o to r/w).
>>>
>>> Great.  The atomic r/o -> r/w transition and the commit coroutine can
>>> be implemented on in parallel.  Are you happy to implement the atomic
>>> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>> assume that part already works and focus on implementing the commit
>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>> work, please let me know if you think this is good.
>>
>> I am happy to do it that way.  I'll shift my focus to the atomic image
>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>> for block-commit on the wiki, because I don't think it conflicts with we
>> discussed above (although I will modify my diagrams to not show commit
>> from the top-level image).  Of course, feel free to change it as
>> necessary.
> 
> I may have mentioned it before, but just in case: I think Supriya's
> bdrv_reopen() patches are a good starting point. I don't know why they
> were never completed, but I think we all agreed on the general design,
> so it should be possible to pick that up.
> 
> Though if you have already started with your own work on it, Jeff, I
> expect that it won't be much different because it's basically the same
> transactional approach that you know and that we already used for group
> snapshots.
>

I will definitely use parts of Supriya's as it makes sense - what I
started work on is similar to bdrv_append() and the current transaction
approach, so there will be plenty in common to reuse, even with some
differences.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]