qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support


From: Jeff Cody
Subject: Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
Date: Fri, 08 Jun 2012 14:33:19 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

On 06/08/2012 01:57 PM, Kevin Wolf wrote:
> Am 08.06.2012 19:46, schrieb Jeff Cody:
>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <address@hidden> wrote:
>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>> Note that block-commit cannot work on the top-most image since the
>>>>>>> guest is still writing to that image and we might never be able to
>>>>>>> copy all the data into the base image (the guest could write new data
>>>>>>> as quickly as we copy it to the base).  The command should check for
>>>>>>> this and reject the top-most image.
>>>>>>
>>>>>> By this you mean that you would like to disallow committing the
>>>>>> top-level image to the base?  Perhaps there is a way to attempt to
>>>>>> converge, and adaptively give more time to the co-routine if we are able
>>>>>> to detect divergence.  This may require violating the 'speed' parameter,
>>>>>> however, and make the commit less 'live'.
>>>>>
>>>>> Yes, I think we should disallow merging down the topmost image.  The
>>>>> convergence problem is the same issue that live migration has.  It's a
>>>>> hard problem and not something that is essential for block-commit.  It
>>>>> would add complexity into the block-commit implementation - the only
>>>>> benefit would be that you can merge down the last COW snapshot.  We
>>>>> already have an implementation that does this: the "commit" command
>>>>> which stops the guest :).
>>>>
>>>> OK.  And, it is better to get this implemented now, and if desired, I
>>>> suppose we can always revisit the convergence at a later date (or not at
>>>> all, if there is no pressing case for it).
>>>
>>> Not allowing live commit for the topmost image may be a good way to
>>> break the problem down into more manageable pieces, but eventually the
>>> feature isn't complete until you can do it.
>>>
>>> Basically what you need there is the equivalent of an active mirror. I
>>> wouldn't be surprised if actually code could be reused for both.
>>
>> I agree, doing it like mirroring for new writes on the top layer makes
>> sense, as long as you are willing to violate the (optional) speed
>> parameter.  I wouldn't think violating the speed parameter in that case
>> would be an issue, as long as it is a documented affect of performing a
>> live commit on the active (top) layer.
> 
> The question is what the speed parameter really means for a live commit
> block job (or for an active mirror). I think it would make sense to
> limit only the speed of I/O that doesn't come from the guest but is from
> the background copying.
> 
> If you did apply it to guest I/O as well, then I think the logical
> consequence would be that guest I/O is throttled as well because
> requests only complete when they have reached both source and target.
> 
> (I know I'm happily mixing terminology of all kinds of live block jobs
> here, hope you understand anyway what I mean ;-))

I understand what you mean :) My thought is that the block commit
'speed' parameter would essentially only apply to 'old data', that is
being copied in the commit coroutine, and it would be ignored (violated)
for 'new data' in the mirroring-like case.

I assumed we wouldn't want to throttle the guest I/O, but that may be a
bad assumption on my part.

> 
>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>> discussed above (although I will modify my diagrams to not show commit
>>>> from the top-level image).  Of course, feel free to change it as
>>>> necessary.
>>>
>>> I may have mentioned it before, but just in case: I think Supriya's
>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>> were never completed, but I think we all agreed on the general design,
>>> so it should be possible to pick that up.
>>>
>>> Though if you have already started with your own work on it, Jeff, I
>>> expect that it won't be much different because it's basically the same
>>> transactional approach that you know and that we already used for group
>>> snapshots.
>>
>> I will definitely use parts of Supriya's as it makes sense - what I
>> started work on is similar to bdrv_append() and the current transaction
>> approach, so there will be plenty in common to reuse, even with some
>> differences.
> 
> I'm not sure how far a bdrv_append-like approach will work in this case.
> The problem is that you must not open any image r/w twice. The policy is
> that you can open an image either once r/w or multiple times read-only.

I think there are 3 possibilities that would be OK: open once r/w, one
or more times r/o, or once r/w + one or more times r/o.  Do you agree
with this?  Is the latter case against block-layer policy?

In this special case for block commit, we need to do the reopen() to go
from a r/o image to a r/w image.  If an image is opened r/o, the drv
layer can do another open with a new descriptor, this time r/w, without
closing the r/o instance.  If the new r/w open was successful, then the
r/o instance can be closed, and some bdrv_append()-like operations done.
Otherwise, like the current transactional code, we can just 'walk away'
by only closing the new image (if it got that far), and the existing BDS
with the r/o image is safely unchanged (and never closed, so no
unrecoverable error paths).

> Raw images are probably forgiving if you happen to open them twice r/w,
> but more complex image formats like qcow2 or qed are likely to get
> corrupted if you do it.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]