qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
Date: Fri, 8 Jun 2012 13:42:13 +0100

On Thu, Jun 7, 2012 at 3:14 PM, Jeff Cody <address@hidden> wrote:
> On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
>> I attended Paolo Bonzini's qemu session ("Live Disk Operations: Juggling
>> Data and Trying to go Unnoticed") in LinuxCon Japan, and he adviced me
>> to post the bits I have regarding my question on qemu's  support on shrinking
>> CoW image.
>>
>> Here's my problem description.
>>
>> I recently designed a experimental system which holds VM master images
>> on a HDD and CoW snapshots on a SSD. VMs run on CoW snapshots only.
>> This split-image configration is done to keep VM I/Os on a SSD
>>
>> As SSD capacity is rather limited, I need to do a writeback commit from SSD 
>> to
>> HDD time to time, and that is done during weekend/midnight. The problem is
>> although a commit is made, that alone won't shrink CoW image - all unused 
>> blocks
>> are still kept in a snapshot, and uses up space.
>>
>> Patch attached is a workaround I added to cope with the problem,
>> but the basic problem I faced was that both QCOW2/QED format still does not
>> support "bdrv_make_empty" API.
>>
>> Implementing the API (say, by hole punching) seemed like a lot of effort, so
>> I ended up creating a new CoW image, and then replace current CoW
>> snapshot with a new (empty) one. But I find the code ugly.
>>
>> In his talk, Paolo suggested possibility of using new "live op" API for this
>> task, but I'm not aware of the actual API. Is there any documentation or
>> source code I can look at to re-implement above feature?
>>
>> Best Regards,
>
> Hello Taisuke-san,
>
> I am working on a document now for a live commit proposal, with the API
> being similar to the block-stream command, but for a live commit.  Here
> is what I am thinking about proposing for the command:
>
> { 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
>                                       '*top': 'str', '*speed': 'int' } }
>
> I think something similar to the above would be good for a 'live
> commit', and it would be somewhat analogous to block streaming, but in
> the other direction.
>
> One issue I see with the patch attached, is the reliance on bdrv_close()
> and a subsequent bdrv_open() - once you perform a bdrv_close(), you no
> longer have the ability to safely recover from error, because it is
> possible for the recovery bdrv_open() to fail for some reason.
>
> The live block commit command I am working on operates like the block
> streaming code, and like transactional commands in that the use of
> bdrv_close() / bdrv_open() to change an image is avoided, so that error
> recovery can be safely done by just abandoning the operation.  A key
> point that needs to be done 'transactionally', is to open the base or
> intermediate target image with file access mode r/w, as the backing
> files are open as r/o by default.
>
> I am going to be putting all my documentation into the qemu wiki today /
> tomorrow, and I will follow up with a link to that if you like.

Thanks for sharing.  This is also something Zhi Hui and I have been
thinking about, my notes are below.  The key difference to Taisuke's
requirement is that I imagined we would simply not support merging the
top image down while the VM is running.  You could only merge an image
down which is not top-most.

<quote>
For incremental backup we typically have a backing file chain like this:

vm001.img <-- snap1.qcow2 <-- snap2.qcow2

The guest is writing to snap2.qcow2.  vm001.img and snap1.qcow2 are
read-only and the guest cannot write to them.

We want to commit snap1.qcow2 down into vm001.img while the guest is running:

vm001.img <-- snap2.qcow2

This means copying allocated blocks from snap1.qcow2 and writing them
into vm001.img.  Once this process is complete it is safe to delete
snap1.qcow2 since all data is now in vm001.img.

As a result we have made the backing file chain shorter.  This is
improtant because otherwise incremental backup would grow the backing
file chain forever - each time it takes a new snapshot the chain
becomes longer and I/O accesses can become slower!

The task is to add a new block job type called "commit".  It is like
the qemu-img commit command except it works while the guest is
running.

The new QMP command should look like this:

{ 'command': 'block-commit', 'data': { 'device': 'str', 'image':
'str', 'base': 'str', '*speed': 'int' }

This command can take a backing file chain:

base <- a <- b <- image <- c

It copies allocated blocks from a <- b <- image into base:

base <- c

After the operation completes a, b, and image can be deleted.

Note that block-commit cannot work on the top-most image since the
guest is still writing to that image and we might never be able to
copy all the data into the base image (the guest could write new data
as quickly as we copy it to the base).  The command should check for
this and reject the top-most image.

This command is similar to block-stream but it copies data "down" to
the backing file instead of "up" from the backing file.  It's
necessary to add this command because in most cases block-commit is
much more efficient than block-stream (the CoW file usually has much
less data than the backing file so less data needs to be copied).
</unquote>

Let's figure out how to specify block-commit so we're all happy, that
way we can avoid duplicating work.  Any comments on my notes above?

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]