* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
13.03.2018 13:30, Dr. David Alan Gilbert wrote:
* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
12.03.2018 18:30, Dr. David Alan Gilbert wrote:
* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
There would be savevm states (dirty-bitmap) which can migrate only in
postcopy stage. The corresponding pending is introduced here.
Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---
[...]
static MigIterateState migration_iteration_run(MigrationState *s)
{
- uint64_t pending_size, pend_post, pend_nonpost;
+ uint64_t pending_size, pend_pre, pend_compat, pend_post;
bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
- qemu_savevm_state_pending(s->to_dst_file, s->threshold_size,
- &pend_nonpost, &pend_post);
- pending_size = pend_nonpost + pend_post;
+ qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
+ &pend_compat, &pend_post);
+ pending_size = pend_pre + pend_compat + pend_post;
trace_migrate_pending(pending_size, s->threshold_size,
- pend_post, pend_nonpost);
+ pend_pre, pend_compat, pend_post);
if (pending_size && pending_size >= s->threshold_size) {
/* Still a significant amount to transfer */
if (migrate_postcopy() && !in_postcopy &&
- pend_nonpost <= s->threshold_size &&
- atomic_read(&s->start_postcopy)) {
+ pend_pre <= s->threshold_size &&
+ (atomic_read(&s->start_postcopy) ||
+ (pend_pre + pend_compat <= s->threshold_size)))
This change does something different from the description;
it causes a postcopy_start even if the user never ran the postcopy-start
command; so sorry, we can't do that; because postcopy for RAM is
something that users can enable but only switch into when they've given
up on it completing normally.
However, I guess that leaves you with a problem; which is what happens
to the system when you've run out of pend_pre+pend_compat but can't
complete because pend_post is non-0; so I don't know the answer to that.
Hmm. Here, we go to postcopy only if "pend_pre + pend_compat <=
s->threshold_size". Pre-patch, in this case we will go to
migration_completion(). So, precopy stage is finishing anyway.
Right.
So, we want
in this case to finish ram migration like it was finished by
migration_completion(), and then, run postcopy, which will handle only dirty
bitmaps, yes?
It's a bit tricky; the first important thing is that we can't change the
semantics of the migration without the 'dirty bitmaps'.
So then there's the question of how a migration with both
postcopy-ram+dirty bitmaps should work; again I don't think we should
enter the postcopy-ram phase until start-postcopy is issued.
Then there's the 3rd case; dirty-bitmaps but no postcopy-ram; in that
case I worry less about the semantics of how you want to do it.
I have an idea:
in postcopy_start(), in ram_has_postcopy() (and may be some other places?),
check atomic_read(&s->start_postcopy) instead of migrate_postcopy_ram()
We've got to use migrate_postcopy_ram() to decide whether we should do
ram specific things, e.g. send the ram discard data.
I'm wanting to make sure that if we have another full postcopy device
(like RAM, maybe storage say) that we'll just add that in with
migrate_postcopy_whatever().
then:
1. behavior without dirty-bitmaps is not changed, as currently we cant go
into postcopy_start and ram_has_postcopy without s->start_postcopy
2. dirty-bitmaps+ram: if user don't set s->start_postcopy, postcopy_start()
will operate as if migration capability was not enabled, so ram should
complete its migration
3. only dirty-bitmaps: again, postcopy_start() will operate as if migration
capability was not enabled, so ram should complete its migration
Why can't we just remove the change to the trigger condition in this
patch? Then I think everything works as long as the management layer
does eventually call migration-start-postcopy ?
(It might waste some bandwidth at the point where there's otherwise
nothing left to send).
Even with the use of migrate-start-postcopy, you're going to need to be
careful about the higher level story; you need to document when to do it
and what the higher levels should do after a migration failure - at the
moment they know that once postcopy starts migration is irrecoverable if
it fails; I suspect that's not true with your dirty bitmaps.
IMHO this still comes back to my original observation from ~18months ago
that in many ways this isn't very postcopy like; in the sense that all
the semantics are quite different from RAM.
Dave
Hmm2. Looked through migration_completion(), I don't understand, how it
finishes ram migration without postcopy. It calls
qemu_savevm_state_complete_precopy(), which skips states with
has_postcopy=true, which is ram...
Because savevm_state_complete_precopy only skips has_postcopy=true in
the in_postcopy case:
(in_postcopy && se->ops->has_postcopy &&
se->ops->has_postcopy(se->opaque)) ||
so when we call it in migration_completion(), if we've not entered
postcopy yet, then that test doesn't trigger.
(Apologies for not spotting this earlier; but I thought this patch was
a nice easy one just adding the postcopy_only_pending - I didn't realise it
changed
existing semantics until I spotted that)
oh, yes, I was inattentive :(
Dave
--
Best regards,
Vladimir
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
--
Best regards,
Vladimir
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK