[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job
From: |
Marcelo Tosatti |
Subject: |
Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job |
Date: |
Thu, 3 Nov 2011 14:34:24 -0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Wed, Nov 02, 2011 at 03:43:49PM +0000, Stefan Hajnoczi wrote:
> On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti <address@hidden> wrote:
> > On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote:
> >> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_num,
> >> + void *buf, int max_sectors, int *n)
> >> +{
> >> + BlockDriverState *bs = s->common.bs;
> >> + int ret;
> >> +
> >> + trace_stream_one_iteration(s, sector_num, max_sectors);
> >> +
> >> + ret = bdrv_is_allocated(bs, sector_num, max_sectors, n);
> >> + if (ret < 0) {
> >> + return ret;
> >> + }
> >
> > bdrv_is_allocated is still synchronous? If so, there should be at least
> > a plan to make it asynchronous.
>
> Yes, that's a good discussion to have. My thoughts are that
> bdrv_is_allocated() should be executed in coroutine context. The
> semantics are a little tricky because of parallel requests:
>
> 1. If a write request is in progress when we do bdrv_is_allocated() we
> might get back "unallocated" even though clusters are just being
> allocated.
> 2. If a TRIM request is in progress when we do bdrv_is_allocated() we
> might get back "allocated" even though clusters are just being
> deallocated.
>
> In order to be reliable the caller needs to be aware of parallel
> requests. I think it's correct to defer this problem to the caller.
>
> In the case of image streaming we're not TRIM-safe, I haven't really
> thought about it yet. But we are safe against parallel write requests
> because there is serialization to prevent copy-on-read requests from
> racing with write requests.
>
> >> + if (!ret) {
> >> + ret = stream_populate(bs, sector_num, *n, buf);
> >> + }
> >> + return ret;
> >> +}
> >> +
> >> +static void coroutine_fn stream_run(void *opaque)
> >> +{
> >> + StreamBlockJob *s = opaque;
> >> + BlockDriverState *bs = s->common.bs;
> >> + int64_t sector_num, end;
> >> + int ret = 0;
> >> + int n;
> >> + void *buf;
> >> +
> >> + buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);
> >> + s->common.len = bdrv_getlength(bs);
> >> + bdrv_get_geometry(bs, (uint64_t *)&end);
> >> +
> >> + bdrv_set_zero_detection(bs, true);
> >> + bdrv_set_copy_on_read(bs, true);
> >
> > Should distinguish between stream initiated and user initiated setting
> > of zero detection and cor (so that unsetting below does not clear
> > user settings).
>
> For zero detection I agree.
>
> For copy-on-read it is questionable since once streaming is complete
> it does not make sense to have copy-on-read enabled.
>
> I will address this in the next revision and think more about the
> copy-on-read case.
>
> >> +
> >> + for (sector_num = 0; sector_num < end; sector_num += n) {
> >> + if (block_job_is_cancelled(&s->common)) {
> >> + break;
> >> + }
> >
> > If cancellation is seen here in the last loop iteration,
> > bdrv_change_backing_file below should not be executed.
>
> I documented this case in the QMP API. I'm not sure if it's possible
> to guarantee that the operation isn't just completing as you cancel
> it. Any blocking point between completion of the last iteration and
> completing the operation is vulnerable to missing the cancel. It's
> easier to explicitly say the operation might just have completed when
> you canceled, rather than trying to protect the completion path. Do
> you think it's a problem to have these loose semantics that I
> described?
No, that is ok. I'm referring to bdrv_change_backing_file() being
executed without the entire image being streamed.
"if (sector_num == end && ret == 0)" includes both all sectors being
streamed and all sectors except the last iteration being streamed (due
to job cancelled break).
> >> +
> >> + /* TODO rate-limit */
> >> + /* Note that even when no rate limit is applied we need to yield
> >> with
> >> + * no pending I/O here so that qemu_aio_flush() is able to return.
> >> + */
> >> + co_sleep_ns(rt_clock, 0);
> >
> > How do you plan to implement rate limit?
>
> It was implemented in the QED-specific image streaming series:
>
> http://repo.or.cz/w/qemu/stefanha.git/commitdiff/22f2c09d2fcfe5e49ac4604fd23e4744f549a476
>
> That implementation works fine and is small but I'd like to reuse the
> migration speed limit, if possible. That way we don't have 3
> different rate-limiting implementations in QEMU :).
One possibility would be to create a "virtual" block device for
streaming, sitting on top of the real block device. Then enforce block
I/O limits on the virtual block device, the guest would remain accessing
the real block device.