qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Date: Tue, 7 Sep 2010 15:49:15 +0100

On Tue, Sep 7, 2010 at 3:34 PM, Kevin Wolf <address@hidden> wrote:
> Am 07.09.2010 15:41, schrieb Anthony Liguori:
>> Hi,
>>
>> We've got copy-on-read and image streaming working in QED and before
>> going much further, I wanted to bounce some interfaces off of the
>> libvirt folks to make sure our final interface makes sense.
>>
>> Here's the basic idea:
>>
>> Today, you can create images based on base images that are copy on
>> write.  With QED, we also support copy on read which forces a copy from
>> the backing image on read requests and write requests.
>>
>> In additional to copy on read, we introduce a notion of streaming a
>> block device which means that we search for an unallocated region of the
>> leaf image and force a copy-on-read operation.
>>
>> The combination of copy-on-read and streaming means that you can start a
>> guest based on slow storage (like over the network) and bring in blocks
>> on demand while also having a deterministic mechanism to complete the
>> transfer.
>>
>> The interface for copy-on-read is just an option within qemu-img
>> create.
>
> Shouldn't it be a runtime option? You can use the very same image with
> copy-on-read or copy-on-write and it will behave the same (execpt for
> performance), so it's not an inherent feature of the image file.
>
> Doing it this way has the additional advantage that you need no image
> format support for this, so we could implement copy-on-read for other
> formats, too.

I agree that streaming should be generic, like block migration.  The
trivial generic implementation is:

void bdrv_stream(BlockDriverState* bs)
{
    for (sector = 0; sector < bdrv_getlength(bs); sector += n) {
        if (!bdrv_is_allocated(bs, sector, &n)) {
            bdrv_read(bs, sector, ...);
            bdrv_write(bs, sector, ...);
        }
    }
}

>
>> Streaming, on the other hand, requires a bit more thought.
>> Today, I have a monitor command that does the following:
>>
>> stream <device> <sector offset>
>>
>> Which will try to stream the minimal amount of data for a single I/O
>> operation and then return how many sectors were successfully streamed.
>>
>> The idea about how to drive this interface is a loop like:
>>
>> offset = 0;
>> while offset < image_size:
>>     wait_for_idle_time()
>>     count = stream(device, offset)
>>     offset += count
>>
>> Obviously, the "wait_for_idle_time()" requires wide system awareness.
>> The thing I'm not sure about is 1) would libvirt want to expose a
>> similar stream interface and let management software determine idle time
>> 2) attempt to detect idle time on it's own and provide a higher level
>> interface.  If (2), the question then becomes whether we should try to
>> do this within qemu and provide libvirt a higher level interface.
>
> I think libvirt shouldn't have to care about sector offsets. You should
> just tell qemu to fetch the image and it should do so. We could have
> something like -drive backing_mode=[cow|cor|stream].
>
>> A related topic is block migration.  Today we support pre-copy migration
>> which means we transfer the block device and then do a live migration.
>> Another approach is to do a live migration, and on the source, run a
>> block server using image streaming on the destination to move the device.
>>
>> With QED, to implement this one would:
>>
>> 1) launch qemu-nbd on the source while the guest is running
>> 2) create a qed file on the destination with copy-on-read enabled and a
>> backing file using nbd: to point to the source qemu-nbd
>> 3) run qemu -incoming on the destination with the qed file
>> 4) execute the migration
>> 5) when migration completes, begin streaming on the destination to
>> complete the copy
>> 6) when the streaming is complete, shut down the qemu-nbd instance on
>> the source
>
> Hm, that's an interesting idea. :-)
>
> Kevin
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]