qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] migration: flush the bdrv before stopping VM


From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH] migration: flush the bdrv before stopping VM
Date: Wed, 18 Mar 2015 14:42:00 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 18.03.2015 um 13:36 hat Juan Quintela geschrieben:
> Kevin Wolf <address@hidden> wrote:
> > The problem is that the block layer really doesn't have an option to
> > control what is getting synced once the data is cached outside of qemu.
> > Essentially we can do an fdatasync() or we can leave it, that's the only
> > choice we have.
> 
> See my explanation, if all that qemu is doing is an fdatasync(), just
> spawn a thread that do the fdatasync() and if the thread don't finish in
> <timeout> time, just return one error.  You can implement that behaviour
> whatever you want.

Yes, and if you use bdrv_co_flush(), this is one of the worker threads
that raw-posix uses. So no new code to write for that.

> > Now doing an asynchronous fdatasync() in the background is completely
> > reasonable in order to reduce the amount of data to be flushed later.
> > But the patch is doing it while holding both the BQL and the AIOContext
> > lock of the device, which doesn't feel right. Maybe it should schedule a
> > BH in the AIOContext instead and flush from there asynchronously.
> 
> Position is wrong, definitelly.  We want to start the asynchronous
> fdatasync() at the start of migration, or each X milliseconds.  At this
> point, we *think* that we can finish the migration on max_downtime
> (basically we are ignoring the time that is going to take to migrate
> device state and do the block layer flush, but normally this takes in
> the other of 100-200ms, so it don't matter at all).
> 
> > The other thing is that flushing once doesn't mean that new data isn't
> > accumulating in the cache, especially if you decide to do the background
> > flush at the start of the migration.
> 
> But "pontentially", we would arrive to this point with less "cached"
> data everything on the system.
> 
> > The obvious way to avoid that would be to switch to a writethrough mode,
> > so any write request goes directly to the disk. This will, however,
> > impact performance so heavily that it's probably not a realistic option.
> >
> > An alternative approach could be to repeat the background flush
> > periodically, either time based or after every x bytes that are written
> > to a device. Time based should actually be quite easy to implement.
> 
> We can do it periodically on the migration thread, if the call is
> thread_safe.  We already have a loop there, and "kind" of time control,
> what we miss is an interface.

Don't do it from the migration thread. You'd have to take locks and stop
the guest from doing I/O while the flush is running.

Instead, create a coroutine and use a BH to enter it from the main loop
of the AIOContext of the block device (i.e. the qemu main loop thread in
most cases, or the dataplane thread if it exists). Then your flush will
run in the background without blocking anyone else.

Time control in coroutines is as easy as calling co_aio_sleep_ns().

> > Once we have the flushing in the background, the migration code can
> > apply any timeouts it wants. If the timeout is exceeded, the flush
> > wouldn't be aborted but keep running in the background, but migration
> > can go back to the iterative state anyway.
> 
> Yeap, that is what we really want/need.

Cool. I don't think it's very hard to get there.

> Notice that my understanding is that any proper fix for this is 2.4 material.

Yes, I agree.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]