qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] migration: broken ram_save_pending


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] migration: broken ram_save_pending
Date: Wed, 5 Feb 2014 09:09:13 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

* Paolo Bonzini (address@hidden) wrote:
> Il 04/02/2014 23:17, Alexey Kardashevskiy ha scritto:
> >>>>> Well, it will fix it in my particular case but in a long run this does 
> >>>>> not
> >>>>> feel like a fix - there should be a way for migration_thread() to know 
> >>>>> that
> >>>>> ram_save_iterate() sent all dirty pages it had to send, no?
> >>>
> >>> No, because new pages might be dirtied while ram_save_iterate() was 
> >>> running.
> >
> >I do not get it, sorry. In my example the ram_save_iterate() sends
> >everything in one go but its caller thinks that it did not and tries again.
> 
> It's not that "the caller thinks that it did not".  The caller knows
> what happens, because migration_bitmap_find_and_reset_dirty updates
> the migration_dirty_pages count that ram_save_pending uses.  So
> migration_dirty_pages should be 0 when ram_save_pending is entered.
> 
> However, something gets dirty in between so remaining_size is again
> 393216 when ram_save_pending returns, after the
> migration_bitmap_sync call.  Because of this the migration thread
> thinks that ram_save_iterate() _will_ not send everything in one go.
> 
> At least, this is how I read the code.  Perhaps I'm wrong. ;)

My reading was a bit different.

I think the case Alexey is hitting is:
   1 A few dirtied pages
   2 but because of the hpratio most of the data is actually zero
     - indeed most of the target-page sized chunks are zero
   3 Thus the data compresses very heavily
   4 When the bandwidth/delay calculation happens it's spent a reasonable
     amount of time transferring a reasonable amount of pages but not
     actually many bytes on the wire, so the estimate of the available
     bandwidth available is lower than reality.
   5 The max-downtime calculation is a comparison of pending-dirty uncompressed
     bytes with compressed bandwidth

(5) is bound to fail if the compression ratio is particularly high, which
because of the hpratio it is if we're just dirtying one word in an entire
host page.

What I'm not too sure of is you'd think if only a few pages were dirtied
that the loop would happen quite quickly and thus the delay would also be
small, and so bytes-on-wire would be divided by a small value and thus
not be too bad.

Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]