qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit


From: Peter Xu
Subject: Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit exceeded
Date: Tue, 25 Jun 2024 10:53:41 -0400

On Tue, Jun 25, 2024 at 12:38:50PM +0100, Joao Martins wrote:
> On 24/06/2024 20:41, Peter Xu wrote:
> > On Fri, Jun 21, 2024 at 07:32:21AM -0700, Elena Ufimtseva wrote:
> >> @@ -2659,6 +2698,18 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
> >> MigrationIncomingState *mis,
>  >>      if (!check_section_footer(f, se)) {
> >>          return -EINVAL;
> >> @@ -2714,6 +2765,19 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
> >> MigrationIncomingState *mis,
> >>                                      se->instance_id, end_ts - start_ts);
> >>      }
> >>  
> >> +    if (migrate_switchover_abort() && type == QEMU_VM_SECTION_END &&
> >> +        mis->downtime_start) {
> >> +        mis->downtime_now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >> +        uint64_t dst_downtime = mis->downtime_now - mis->downtime_start;
> >> +        if (mis->src_downtime + dst_downtime >= mis->abort_limit) {
> >> +            error_report("Shutdown destination migration, migration 
> >> abort_limit (%lu ms)"
> >> +                          "was reached.", mis->abort_limit);
> >> +            trace_qemu_loadvm_downtime_abort(mis->abort_limit, 
> >> dst_downtime,
> >> +                                             mis->src_downtime);
> >> +            return -EINVAL;
> >> +        }
> >> +    }
> > 
> > So this traps both iteration and non-iteration phase.  What if the downtime
> > was caused by after these, or unluckily at the last non-iterator device
> > state?
> > 
> > After trying to think about it, I figured this is not easy to do right.
> > Also, I start to doubt whether it's definitely a good idea on having this
> > in the first place.
> > 
> > Firstly, I'm wondering how we should treat this new feature
> > v.s. downtime_limit.  It's about how the user should set both.
> > 
> > If this is about "cancel migration if downtime more than xxx",
> > then.. AFAICT that's exactly what downtime_limit was "designed" to be..
> > It's just that downtime_limit says the other way round, as: "don't
> > switchover if the downtime will be more than xxx".
> > 
> > Then, if the user has option to set both these parameters, what would be
> > the right thing to do?  Shouldn't they simply always set both parameters to
> > be the same value already?  But if so, what's the point on having two?
> > 
> > This caused me to think whether the 2nd parameter is needed at all, instead
> > whether we should simply make downtime_limit more accurate, so that it will
> > start to account more things than before.  It won't be accurate, but the
> > same case here: making this new feature "accurate" can also be very hard.
> > 
> 
> The way I think about it is that current downtime-limit captures nicely the 
> data
> part as the only calculations it cares about is how much outstanding data it
> sends to the destination (be it VF device state or RAM). This second parameter
> captures what is *not* related to data, i.e. costs of hypervisor quiescing the
> VM or added latencies in hypervisor *in addition* to sending outstanding data 
> to
> destination.
> 
> If we were to merge this all into a single parameter (aka downtime-limit) we 
> are
> possibility artificially increasing the downtime thanks to relaxing the
> oustanding data part, as opposed to trying to capture the switchover cost 
> (hence
> the name the parameter) that can't be reflected in the former equation.
> 
> So I feel like this needs two parameters whereby this second new parameter
> covers 'switchover cost' (hence the name) with current migration algorithm.

Then the question is how should we suggest the user to specify these two
parameters.

The cover letter used:

  migrate_set_parameter downtime-limit 300
  migrate_set_parameter switchover-limit 10

But it does look like a testing sample only and not valid in real world
setup, as logically the new limit should be larger than the old one,
afaict.  If the new limit is larger, how larger should it be?

So far I can understand how it works intenrally, but even with that I don't
know how I should set this parameter, e.g., when downtime_limit used to be
300ms, I'd say 500ms could be a suitable value, but is it?  In that case,
perhaps the 500ms is the "real" limit that I don't want a VM to be halted
for more than 500ms, but in that case the user should have already setup
downtime_limit to 500ms.

I also don't know how should an user understand all these details and
figure out how to set these.  Note that currently we definitely allow the
user to specify downtime_limit and it's so far understandable, even though
the user may assume that contains the whole blackout phase (that's also
what we wish it can achieve..), rather than the fact that it only takes
ram/vfio/... reported remaining data into account v.s. the bandwidth.

Maybe we could consider introducing the parameters/caps in some other form,
so that we can keep the downtime_limit to be "the total downtime", instead
of a new definition of downtime.  E.g., it's at least not fair indeed to
take the whole "downtime_limit" for data transfers, so maybe some ratio
parameter can be introduced to say how much portion of that downtime can be
used for data transfer, and then it might be ok to work with the new cap
introduced so that when total downtime is over the limit it will abort the
migration (but without a new "max downtime" definition which might be
confusing).

Said that, I also wonder whether you have thought about improving
downtime_limit itself.  It'll be definitely better if we simply don't
switchover at all if that won't make it.  IOW, I wonder whether there can
be case where user specifies these parameters, migration simply keeps
switching over and keep aborting, requiring a retry.  That's pretty
unwanted.  We may simply doesn't allow that switchover to happen at all.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]