qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Generic image streaming


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC] Generic image streaming
Date: Tue, 27 Sep 2011 13:07:31 +0100

On Mon, Sep 26, 2011 at 3:21 PM, Stefan Hajnoczi
<address@hidden> wrote:
> On Mon, Sep 26, 2011 at 09:35:01AM -0300, Marcelo Tosatti wrote:
>> On Fri, Sep 23, 2011 at 04:57:26PM +0100, Stefan Hajnoczi wrote:
>> > Here is my generic image streaming branch, which aims to provide a way
>> > to copy the contents of a backing file into an image file of a running
>> > guest without requiring specific support in the various block drivers
>> > (e.g.  qcow2, qed, vmdk):
>> >
>> > http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api
>> >
>> > The tree does not provide full image streaming yet but I'd like to
>> > discuss the approach taken in the code.  Here are the main points:
>> >
>> > The image streaming API is available through HMP and QMP commands.  When
>> > streaming is started on a block device a coroutine is created to do the
>> > background I/O work.  The coroutine can be cancelled.
>> >
>> > While the coroutine copies data from the backing file into the image
>> > file, the guest may be performing I/O to the image file.  Guest reads do
>> > not conflict with streaming but guest writes require special handling.
>> > If the guest writes to a region of the image file that we are currently
>> > copying, then there is the potential to clobber the guest write with old
>> > data from the backing file.
>> >
>> > Previously I solved this in a QED-specific way by taking advantage of
>> > the serialization of allocating write requests.  In order to do this
>> > generically we need to track in-flight requests and have the ability to
>> > queue I/O.  Guest writes that affect an in-flight streaming copy
>> > operation must wait for that operation to complete before being issued.
>> > Streaming copy operations must skip overlapping regions of guest writes.
>> >
>> > One big difference to the QED image streaming implementation is that
>> > this generic implementation is not based on copy-on-read operations.
>> > Instead we do a sequence of bdrv_is_allocated() to find regions for
>> > streaming, followed by bdrv_co_read() and bdrv_co_write() in order to
>> > populate the image file.
>> >
>> > It turns out that generic copy-on-read is not an attractive operation
>> > because it requires using bounce buffers for every request.
>>
>> Isnt COR essential for a decent read performance on the
>> image-stream-from-slow-remote-origin case?
>
> It is essential for re-read performance from a slow backing file.  With
> images over internet HTTP it most definitely is worth doing
> copy-on-read.
>
> In the case of an NFS server the performance depends on the network and
> server.  It might be similar speed or faster to read from NFS.
>
> I will think some more about how to implement generic copy-on-read.

I've sketched out how generic copy-on-read can work.  It's probably
not much extra effort since we need request tracking and the ability
to queue/hold requests anyway.

I hope to have patches implementing this by the end of the week:

1. When CoR is enabled, overlapping requests get queued so that only
one is actually being issued to the host at a time.  This prevents
race conditions where a guest write request is clobbered by a
copy-on-read.  Note that only overlapping requests are queued,
non-overlapping requests proceed in parallel.

2. The read operation uses bdrv_is_allocated() first to see whether a
copy-on-read needs to be performed or if we can go down the fast path.
 The fast path is the normal read straight into the guest buffer.  The
copy-on-read path reads into a bounce buffer, writes into the image
file, and then copies the bounce buffer into the guest buffer.

3. The .bdrv_is_allocated() implementations will be audited and
improved to make them aio/coroutine-friendly where necessary.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]