[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/3] migration: Downtime tracepoints
|
From: |
Peter Xu |
|
Subject: |
Re: [PATCH 0/3] migration: Downtime tracepoints |
|
Date: |
Mon, 30 Oct 2023 11:13:55 -0400 |
On Fri, Oct 27, 2023 at 11:17:41PM +0100, Joao Martins wrote:
> On 27/10/2023 15:41, Peter Xu wrote:
> > On Fri, Oct 27, 2023 at 09:58:03AM +0100, Joao Martins wrote:
> >> On 26/10/2023 21:07, Peter Xu wrote:
> >>> On Thu, Oct 26, 2023 at 08:33:13PM +0100, Joao Martins wrote:
> >>>> Sure. For the fourth patch, feel free to add Suggested-by and/or a Link,
> >>>> considering it started on the other patches (if you also agree it is
> >>>> right). The
> >>>> patches ofc are enterily different, but at least I like to believe the
> >>>> ideas
> >>>> initially presented and then subsequently improved are what lead to the
> >>>> downtime
> >>>> observability improvements in this series.
> >>>
> >>> Sure, I'll add that.
> >>>
> >>> If you like, I would be definitely happy to have Co-developed-by: with
> >>> you,
> >>> if you agree.
> >>
> >> Oh, that's great, thanks!
> >
> > Great! I apologize on not asking already before a formal patch is post.
> >
> >>
> >>> I just don't know whether that addressed all your need, and
> >>> I need some patch like that for our builds.
> >>
> >> I think it achieves the same as the other series. Or rather it
> >> re-implements it
> >> but with less compromise on QAPI and made the tracepoints more 'generic'
> >> to even
> >> other usecases and less specific to the 'checkpoint breakdown'. Which
> >> makes the
> >> implementation simpler (like we don't need that array storing the
> >> checkpoint
> >> timestamps) given that it's just tracing and not for QAPI.
> >
> > Yes. Please also feel free to have a closer look on the exact checkpoints
> > in that patch. I just want to make sure that'll be able to service the
> > same as the patch you proposed, but with tracepoints, and I don't miss
> > anything important.
> >
> > The dest checkpoints are all new, I hope I nailed them all right as we
> > would expect.
> >
> Yeah, it looks like so; but I am no master of postcopy so I don't have quite
> the
> cirurgical eye there.
>
> Perhaps for the loadvm side, 'precopy-bh-enter' suggests some ambiguity (given
> that precopy happens on both source and destination?). Perhaps being explicit
> in
> either incoming-bh-enter? or even prefix checkpoint names by 'source' on
> source
> side for example to distinguish?
I don't think src has a bottom half; if not considering cleanup_bh as part
of migration at all. Destination's BH is a major part of migration.
In all cases, let me prefix them with "src"/"dst" then, hopefully even clearer.
>
> > For src checkpoints, IIRC your patch explicitly tracked return path closing
> > while patch 4 only made it just part of final enclosure; the 4th checkpoint
> > is after non-iterable sent, until 5th to be the last "downtime-end". It can
> > cover more than "return path close":
> >
> > qemu_savevm_state_complete_precopy_non_iterable <--- 4th checkpoint
> > qemu_fflush (after non-iterable sent)
> > close_return_path_on_source
> > cpu_throttle_stop
> > qemu_mutex_lock_iothread
> > migration_downtime_end <---- 5th checkpoint
> >
> > If you see fit or necessary, I can, for example, also add one more
> > checkpoint right after close return path. I just didn't know whether it's
> > your intention to explicitly check that point. Just let me know if so.
> >
> It was put there because it was a simple catch-all from the source PoV on how
> much the destination is taking. Of course bearing in mind that without
> return-path then such metric/tracepoint is not available. On the other hand,
> with
> migration_downtime_end I think we have the equivalent in that resume
> checkpoint.
> So we just need to make the diff between that and the non-iterable and I think
> we have the same things as I was looking for (even more I would say).
>
> > Also on whether you prefer to keep a timestamp in the tracepoint itself;
> > I only use either "log" or "dtrace"+qemu-trace-stap for tracing: both of
> > them contain timestamps already. But I can also add the timestamps
> > (offseted by downtime_start) if you prefer.
> >
> Perhaps it is easy to wrap the checkpoint tracepoint in its own function to
> allow extension of something else e.g. add the timestamp (or any other data
> into
> the checkpoints) or do something in a particular checkpoint of the migration.
> The timing function from qemu would give qemu own idea of time (directly
> correlable with the downtime metric that applications consume) which would be
> more consistent? though I am at too minds on this whether to rely on tracing
> stamps or align with what's provided to applications.
Yes it should be more consistent. Let me add them into the checkpoints.
Thanks,
--
Peter Xu
- [PATCH 1/3] migration: Set downtime_start even for postcopy, (continued)
- [PATCH 1/3] migration: Set downtime_start even for postcopy, Peter Xu, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/27
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/27
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/27
- Re: [PATCH 0/3] migration: Downtime tracepoints,
Peter Xu <=
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/30
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/30
[PATCH 4/3] migration: Add tracepoints for downtime checkpoints, Peter Xu, 2023/10/26