[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference o
From: |
Marcelo Tosatti |
Subject: |
Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration |
Date: |
Mon, 14 Nov 2016 13:40:18 -0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Mon, Nov 14, 2016 at 04:00:38PM +0100, Paolo Bonzini wrote:
>
>
> On 14/11/2016 15:50, Marcelo Tosatti wrote:
> > Well, i didnt want to mix the meaning of the variables:
> >
> > + /* whether machine supports reliable KVM_GET_CLOCK */
> > + bool mach_use_reliable_get_clock;
> > +
> > + /* whether source host supported reliable KVM_GET_CLOCK */
> > + bool src_use_reliable_get_clock;
> >
> > See the comments on top (later if you look at the variable,
> > then have to think: well it has one name, but its disabled
> > by that other path as well, so its more than its
> > name,etc...).
> >
> >> I'm thinking "mach_use_reliable_get_clock is just for migration,
> >
> > Thats whether the machine supports it. New machines have it enabled,
> > olders don't.
>
> Yes.
>
> >> src_use_reliable_get_clock is the state".
> >
> > Thats whether the migration source supported it.
>
> But it's not used only for migration. It's used on every vmstate change
> (running->stop and stop->running, isn't it?
No. Its used only if s->clock_valid == false, which means either:
migration/savevm/init.
Now for savevm there is a bug:
> I think that, apart from
> the migration case, it's better to use s->clock if kvmclock is stable,
> even on older machine types.
Yes, its already doing that:
/* local (running VM) restore */
if (s->clock_valid) {
/*
* if host does not support reliable KVM_GET_CLOCK,
* read kvmclock value from memory
*/
if (!kvm_has_adjust_clock_stable()) {
time_at_migration = kvmclock_current_nsec(s);
}
It only uses kvmclock_current_nsec() if kvm_has_adjust_clock_stable is
false.
> >>>>> +static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> >>>>> +{
> >>>>> + KVMClockState *s = opaque;
> >>>>> +
> >>>>> + /*
> >>>>> + * On machine types that support reliable KVM_GET_CLOCK,
> >>>>> + * if host kernel does provide reliable KVM_GET_CLOCK,
> >>>>> + * set src_use_reliable_get_clock=true so that destination
> >>>>> + * avoids reading kvmclock from memory.
> >>>>> + */
> >>>>> + if (s->mach_use_reliable_get_clock &&
> >>>>> kvm_has_adjust_clock_stable()) {
> >>>>> + s->src_use_reliable_get_clock = true;
> >>>>> + }
> >>>>> +
> >>>>> + return s->src_use_reliable_get_clock;
> >>>>> +}
> >>>>
> >>>> Here you can just return s->mach_use_reliable_get_clock.
> >>>
> >>> mach_use_reliable_get_clock can be true but host might not support it.
> >>
> >> Yes, but the "needed" function is only required to avoid breaking
> >> pc-i440fx-2.7 and earlier.
> >
> > "needed" is required so that the migration between:
> >
> > SRC DEST BEHAVIOUR
> > ~support supports on migration read from guest,
> > on stop/cont use
> > kvm_get_clock/kvm_set_clock
> >
> > Destination does not use KVM_GET_CLOCK value (which is
> > broken and should not be used).
>
> If needed returns false, the destination will see
> src_use_reliable_get_clock = false anyway.
Causing it to read the clock from memory, thats what
should happen.
> If needed returns true, the destination can still see
> src_use_reliable_get_clock = false if that's the value.
You are asking me to change from:
+static bool kvmclock_src_use_reliable_get_clock(void *opaque)
+{
+ KVMClockState *s = opaque;
+
+ /*
+ * On machine types that support reliable KVM_GET_CLOCK,
+ * if host kernel does provide reliable KVM_GET_CLOCK,
+ * set src_use_reliable_get_clock=true so that destination
+ * avoids reading kvmclock from memory.
+ */
+ if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
+ s->src_use_reliable_get_clock = true;
+ }
+
+ return s->src_use_reliable_get_clock;
+}
to
static bool kvmclock_src_use_reliable_get_clock(void *opaque)
{
KVMClockState *s = opaque;
/*
* On machine types that support reliable KVM_GET_CLOCK,
* if host kernel does provide reliable KVM_GET_CLOCK,
* set src_use_reliable_get_clock=true so that destination
* avoids reading kvmclock from memory.
*/
if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
{
s->src_use_reliable_get_clock = true;
}
return s->mach_use_reliable_get_clock;
}
Ah, OK, done.
> needed src_use_reliable_get_clock effect
> false false use kvmclock_current_nsec
> false true use kvmclock_current_nsec
> true false use kvmclock_current_nsec
> true true use s->clock
>
>
> So the idea is:
>
> - set s->clock and s->reliable_get_clock on every KVM_GET_CLOCK
Already do that (apart from s->clock_valid which is necessary).
> - on migration source, use subsections and
> s->mach_use_reliable_get_clock to avoid breaking migration on pre-2.8
> machine types
Migration is already broken when you use different machine types.
So s->src_use_reliable_get_clock is only used to indicate
to the destination that: "you can use KVM_GET_CLOCK value,
its safe".
> - on migration destination, use .pre_load so that s->reliable_get_clock
> is initialized to false on older machine types
Thats what mach_use_reliable_get_clock does already:
static Property kvmclock_properties[] = {
DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState,
mach_use_reliable_get_clock, true),
DEFINE_PROP_END_OF_LIST(),
};
+ .driver = "kvmclock",\
+ .property = "mach_use_reliable_get_clock",\
+ .value = "off",\
+ },\
+ {\
>
> >> If you return true here, you can still
> >> migrate a "false" value for src_use_reliable_get_clock.
> >
> > But the source only uses a reliable KVM_GET_CLOCK if
> > both conditions are true.
> >
> > And the subsection is only needed if the source
> > uses a reliable KVM_GET_CLOCK.
>
> Yes, but the point of the subsection is just to avoid breaking migration
> format.
Its a new creative use of the subsection.
> >> It is the same as "is using masterclock", which is actually a stricter
> >> condition than the KVM_CHECK_EXTENSION return value. The right check to
> >> use is whether masterclock is in use,
> >
> > Actually its "has a reliable KVM_GET_CLOCK" (which returns
> > get_kernel_clock() + (rdtsc() - tsc_timestamp),
> >
> > "broken KVM_GET_CLOCK" = get_kernel_clock()
>
> Yes. And these end up being the same.
No. You can have masterclock enabled, KVM_TSC_STABLE set in pvclock
flags, and unreliable KVM_GET_CLOCK (without your patch to
KVM_GET_CLOCK).
Paolo not sure if there is anything further you want me to
change at this point ?
Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c
===================================================================
--- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c 2016-11-14
10:40:39.748116312 -0200
+++ qemu-mig-advance-clock/hw/i386/kvm/clock.c 2016-11-14 13:38:29.299955042
-0200
@@ -36,6 +36,12 @@
uint64_t clock;
bool clock_valid;
+
+ /* whether machine supports reliable KVM_GET_CLOCK */
+ bool mach_use_reliable_get_clock;
+
+ /* whether source host supported reliable KVM_GET_CLOCK */
+ bool src_use_reliable_get_clock;
} KVMClockState;
struct pvclock_vcpu_time_info {
@@ -81,6 +87,19 @@
return nsec + time.system_time;
}
+static uint64_t kvm_get_clock(void)
+{
+ struct kvm_clock_data data;
+ int ret;
+
+ ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
+ if (ret < 0) {
+ fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+ abort();
+ }
+ return data.clock;
+}
+
static void kvmclock_vm_state_change(void *opaque, int running,
RunState state)
{
@@ -91,15 +110,37 @@
if (running) {
struct kvm_clock_data data = {};
- uint64_t time_at_migration = kvmclock_current_nsec(s);
+ uint64_t pvclock_via_mem = 0;
- s->clock_valid = false;
+ /* local (running VM) restore */
+ if (s->clock_valid) {
+ /*
+ * if host does not support reliable KVM_GET_CLOCK,
+ * read kvmclock value from memory
+ */
+ if (!kvm_has_adjust_clock_stable()) {
+ pvclock_via_mem = kvmclock_current_nsec(s);
+ }
+ /* migration/savevm/init restore */
+ } else {
+ /*
+ * use s->clock in case machine uses reliable
+ * get clock and source host supported
+ * reliable get clock
+ */
+ if (!(s->mach_use_reliable_get_clock &&
+ s->src_use_reliable_get_clock)) {
+ pvclock_via_mem = kvmclock_current_nsec(s);
+ }
+ }
- /* We can't rely on the migrated clock value, just discard it */
- if (time_at_migration) {
- s->clock = time_at_migration;
+ /* We can't rely on the saved clock value, just discard it */
+ if (pvclock_via_mem) {
+ s->clock = pvclock_via_mem;
}
+ s->clock_valid = false;
+
data.clock = s->clock;
ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
if (ret < 0) {
@@ -120,8 +161,6 @@
}
}
} else {
- struct kvm_clock_data data;
- int ret;
if (s->clock_valid) {
return;
@@ -129,13 +168,7 @@
kvm_synchronize_all_tsc();
- ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
- if (ret < 0) {
- fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
- abort();
- }
- s->clock = data.clock;
-
+ s->clock = kvm_get_clock();
/*
* If the VM is stopped, declare the clock state valid to
* avoid re-reading it on next vmsave (which would return
@@ -152,22 +185,69 @@
qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
}
+static bool kvmclock_src_use_reliable_get_clock(void *opaque)
+{
+ KVMClockState *s = opaque;
+
+ /*
+ * On machine types that support reliable KVM_GET_CLOCK,
+ * if host kernel does provide reliable KVM_GET_CLOCK,
+ * set src_use_reliable_get_clock=true so that destination
+ * avoids reading kvmclock from memory.
+ */
+ if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
+ s->src_use_reliable_get_clock = true;
+ }
+
+ return s->mach_use_reliable_get_clock;
+}
+
+static const VMStateDescription kvmclock_reliable_get_clock = {
+ .name = "kvmclock/src_use_reliable_get_clock",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .needed = kvmclock_src_use_reliable_get_clock,
+ .fields = (VMStateField[]) {
+ VMSTATE_BOOL(src_use_reliable_get_clock, KVMClockState),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+static void kvmclock_pre_save(void *opaque)
+{
+ KVMClockState *s = opaque;
+
+ s->clock = kvm_get_clock();
+}
+
static const VMStateDescription kvmclock_vmsd = {
.name = "kvmclock",
.version_id = 1,
.minimum_version_id = 1,
+ .pre_save = kvmclock_pre_save,
.fields = (VMStateField[]) {
VMSTATE_UINT64(clock, KVMClockState),
VMSTATE_END_OF_LIST()
+ },
+ .subsections = (const VMStateDescription * []) {
+ &kvmclock_reliable_get_clock,
+ NULL
}
};
+static Property kvmclock_properties[] = {
+ DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState,
+ mach_use_reliable_get_clock, true),
+ DEFINE_PROP_END_OF_LIST(),
+};
+
static void kvmclock_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
dc->realize = kvmclock_realize;
dc->vmsd = &kvmclock_vmsd;
+ dc->props = kvmclock_properties;
}
static const TypeInfo kvmclock_info = {
Index: qemu-mig-advance-clock/include/hw/i386/pc.h
===================================================================
--- qemu-mig-advance-clock.orig/include/hw/i386/pc.h 2016-11-14
10:40:39.748116312 -0200
+++ qemu-mig-advance-clock/include/hw/i386/pc.h 2016-11-14 10:40:48.059128649
-0200
@@ -370,6 +370,11 @@
#define PC_COMPAT_2_7 \
HW_COMPAT_2_7 \
{\
+ .driver = "kvmclock",\
+ .property = "mach_use_reliable_get_clock",\
+ .value = "off",\
+ },\
+ {\
.driver = TYPE_X86_CPU,\
.property = "l3-cache",\
.value = "off",\
Index: qemu-mig-advance-clock/target-i386/kvm.c
===================================================================
--- qemu-mig-advance-clock.orig/target-i386/kvm.c 2016-11-14
10:40:39.750116314 -0200
+++ qemu-mig-advance-clock/target-i386/kvm.c 2016-11-14 10:40:48.061128652
-0200
@@ -117,6 +117,13 @@
return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM);
}
+bool kvm_has_adjust_clock_stable(void)
+{
+ int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
+
+ return (ret == KVM_CLOCK_TSC_STABLE);
+}
+
bool kvm_allows_irq0_override(void)
{
return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
Index: qemu-mig-advance-clock/target-i386/kvm_i386.h
===================================================================
--- qemu-mig-advance-clock.orig/target-i386/kvm_i386.h 2016-11-14
10:40:39.751116316 -0200
+++ qemu-mig-advance-clock/target-i386/kvm_i386.h 2016-11-14
10:40:48.061128652 -0200
@@ -17,6 +17,7 @@
bool kvm_allows_irq0_override(void);
bool kvm_has_smm(void);
+bool kvm_has_adjust_clock_stable(void);
void kvm_synchronize_all_tsc(void);
void kvm_arch_reset_vcpu(X86CPU *cs);
void kvm_arch_do_init_vcpu(X86CPU *cs);
- [Qemu-devel] [qemu patch 0/2] improve kvmclock difference on migration, Marcelo Tosatti, 2016/11/14
- [Qemu-devel] [qemu patch 1/2] kvm: sync linux headers, Marcelo Tosatti, 2016/11/14
- [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Marcelo Tosatti, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Paolo Bonzini, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Marcelo Tosatti, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Paolo Bonzini, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Marcelo Tosatti, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Paolo Bonzini, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration,
Marcelo Tosatti <=
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Paolo Bonzini, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Marcelo Tosatti, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Paolo Bonzini, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Marcelo Tosatti, 2016/11/14
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Marcelo Tosatti, 2016/11/17
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Paolo Bonzini, 2016/11/17
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Paolo Bonzini, 2016/11/28
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Eduardo Habkost, 2016/11/28
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Paolo Bonzini, 2016/11/28
- Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration, Marcelo Tosatti, 2016/11/28