[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer |
Date: |
Wed, 3 Feb 2016 15:59:26 +1100 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Tue, Feb 02, 2016 at 11:41:40PM +0000, Mark Cave-Ayland wrote:
> On 01/02/16 00:52, David Gibson wrote:
>
> >> Thanks for more pointers - I think I'm slowly getting there. My current
> >> thoughts are that the basic migration algorithm is doing the right thing
> >> in that it works out the number of host ticks different between source
> >> and destination.
> >
> > Sorry, I've take a while to reply to this. I realised the tb
> > migration didn't work the way I thought it did, so I've had to get my
> > head around what's actually going on.
>
> No problem - it's turning out to be a lot more complicated than I
> initially expected.
>
> > I had thought that it transferred only meta-information telling the
> > destination how to calculate the timebase, without actually working
> > out the timebase value at any particular moment.
> >
> > In fact, what it sends is basically the tuple of (timebase, realtime)
> > at the point of sending the migration stream. The destination then
> > uses that to work out how to compute the timebase from realtime there.
> >
> > I'm not convinced this is a great approach, but it should basically
> > work. However, as you've seen there are also some Just Plain Bugs in
> > the logic for this.
> >
> >> I have a slight query with this section of code though:
> >>
> >> migration_duration_tb = muldiv64(migration_duration_ns, freq,
> >> NANOSECONDS_PER_SECOND);
> >>
> >> This is not technically correct on TCG x86 since the timebase is the x86
> >> TSC which is running somewhere in the GHz range, compared to freq which
> >> is hard-coded to 16MHz.
> >
> > Um.. what? AFAICT that line doesn't have any reference to the TSC
> > speed. Just ns and the (guest) tb). Also 16MHz is only for the
> > oldworld Macs - modern ppc cpus have the TB frequency architected as
> > 512MHz.
>
> On TCG the software timebase for the Mac guests is fixed at 16MHz so how
> does KVM handle this?
> Does it compensate by emulating the 16MHz timebase
> for the guest even though the host has a 512HMz timebase?
No, it can't. The timebase is not privileged, so there's no way to
virtualize it for the guest. So, the best we can do is to detect KVM,
override the guest device tree with the host timebase frequency and
hope the guest reads it rather than assuming a fixed value for the
platform.
> >> However this doesn't seem to matter because the
> >> timebase adjustment is limited to a maximum of 1s. Why should this be if
> >> the timebase is supposed to be free running as you mentioned in a
> >> previous email?
> >
> > AFAICT, what it's doing here is assuming that if the migration
> > duration is >1s (or appears to be >1s) then it's because the host
> > clocks are out of sync and so just capping the elapsed tb time at 1s.
> >
> > That's just wrong, IMO. 1s is a long downtime for a live migration,
> > but it's not impossible, and it will happen nearly always in the
> > scenariou you've discussed of manually loading the migration stream
> > from a file.
> >
> > But more to the point, trying to maintain correctness of the timebase
> > when the hosts are out of sync is basically futile. There's no other
> > reference we can use, so all we can achieve is getting a different
> > wrong value from what we'd get by blindly trusting the host clock.
> >
> > We do need to constrain the tb from going backwards, because that will
> > cause chaos on the guest, but otherwise we should just trust the host
> > clock and ditch that 1s clamp. If the hosts are out of sync, then
> > guest time will jump, but that was always going to happen.
>
> Going back to your earlier email you suggested that the host timebase is
> always continuously running, even when the guest is paused. But then
> resuming the guest then the timebase must jump in the guest regardless?
>
> If this is the case then this is the big difference between TCG and KVM
> guests: TCG timebase is derived from the virtual clock which solves the
> problem of paused guests during migration. For example with the existing
> migration code, what would happen if you did a migration with the guest
> paused on KVM? The offset would surely be wrong as it was calculated at
> the end of migration.
So there are two different cases to consider here. Once is when the
guest is paused incidentally, such as during migration, the other is
when the guest is explicitly paused.
In the first case the timebase absolutely should keep running (or
appear to do so), since it's the primary source of real time for the
guest.
In the second case, it's a bit unclear what the right thing to do is.
Keeping the tb running means accurate realtime, but stopping it is
often better for debugging, which is one of the main reasons to
explicitly pause.
I believe spapr on KVM HV will keep the TB going, but the TSC on x86
will be stopped.
> And another thought: should it be possible to migrate guests between TCG
> and KVM hosts at will?
It would be nice if that's possible, but I don't think it's a primary goal.
> >> AFAICT the main problem on TCG x86 is that post-migration the timebase
> >> calculated by cpu_ppc_get_tb() is incorrect:
> >>
> >> uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t
> >> tb_offset)
> >> {
> >> /* TB time in tb periods */
> >> return muldiv64(vmclk, tb_env->tb_freq, get_ticks_per_sec()) +
> >> tb_offset;
> >> }
> >
> >
> > So the problem here is that get_ticks_per_sec() (which always returns
> > 1,000,000,000) is not talking about the same ticks as
> > cpu_get_host_ticks(). That may not have been true when this code was
> > written.
>
> Yes. That's basically what I was trying to say but I think you've
> expressed it far more eloquently than I did.
>
> >> For a typical savevm/loadvm pair I see something like this:
> >>
> >> savevm:
> >>
> >> tb->guest_timebase = 26281306490558
> >> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511
> >>
> >> loadvm:
> >>
> >> cpu_get_host_ticks() = 26289847005259
> >> tb_off_adj = -8540514701
> >> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511
> >> cpu_ppc_get_tb() = -15785159386
> >>
> >> But as cpu_ppc_get_tb() uses QEMU_CLOCK_VIRTUAL for vmclk we end up with
> >> a negative number for the timebase since the virtual clock is dwarfed by
> >> the number of TSC ticks calculated for tb_off_adj. This will work on a
> >> PPC host though since cpu_host_get_ticks() is also derived from the
> >> timebase.
> >
> > Yeah, we shouldn't be using cpu_host_get_ticks() at all - or anything
> > else which depends on a host frequency. We should only be using qemu
> > interfaces which work in real time units (nanoseconds, usually).
>
> I agree that this is the right way forward. Unfortunately the timebase
> behaviour under KVM PPC is quite new to me, so please do bear with me
> for asking all these questions.
>
>
> ATB,
>
> Mark.
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
- Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, Mark Cave-Ayland, 2016/02/02
- Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer,
David Gibson <=
- Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, Alexander Graf, 2016/02/03
- Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, Mark Cave-Ayland, 2016/02/23
- Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, David Gibson, 2016/02/23
- Re: [Qemu-ppc] Migrating decrementer, Juan Quintela, 2016/02/24
- Re: [Qemu-ppc] Migrating decrementer, David Gibson, 2016/02/24
- Re: [Qemu-ppc] Migrating decrementer, Mark Cave-Ayland, 2016/02/24
- Re: [Qemu-ppc] Migrating decrementer, Mark Cave-Ayland, 2016/02/25
- Re: [Qemu-ppc] Migrating decrementer, Mark Cave-Ayland, 2016/02/25
- Re: [Qemu-ppc] Migrating decrementer, David Gibson, 2016/02/25
- Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, Mark Cave-Ayland, 2016/02/26