qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older c


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types
Date: Wed, 29 Aug 2012 09:56:12 -0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Aug 29, 2012 at 01:06:32PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 28, 2012 at 08:50:09PM -0300, Eduardo Habkost wrote:
> > On Wed, Aug 29, 2012 at 01:25:35AM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Aug 29, 2012 at 01:21:13AM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 28, 2012 at 07:02:42PM -0300, Eduardo Habkost wrote:
> > > > > On Wed, Aug 29, 2012 at 12:35:28AM +0300, Michael S. Tsirkin wrote:
> > > > > > On Tue, Aug 28, 2012 at 04:13:38PM -0300, Eduardo Habkost wrote:
> > > > > > > On Tue, Aug 28, 2012 at 08:43:52PM +0300, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > In preparation for adding PV EOI support, disable PV EOI by 
> > > > > > > > default for
> > > > > > > > 1.1 and older machine types, to avoid CPUID changing during 
> > > > > > > > migration.
> > > > > > > > 
> > > > > > > > PV EOI can still be enabled/disabled by specifying it 
> > > > > > > > explicitly.
> > > > > > > >     Enable for 1.1
> > > > > > > >     -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
> > > > > > > >     Disable for 1.2
> > > > > > > >     -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
> > > > > > > > 
> > > > > > > 
> > > > > > > What about users that are already running "qemu-1.1 -M pc-1.1" on 
> > > > > > > a host
> > > > > > > kernel that supports PV EOI already? They would get PV EOI 
> > > > > > > disabled when
> > > > > > > migrating to a destination running "qemu-1.2 -M pc-1.1".
> > > > > > > 
> > > > > > > (On the other hand, people running "qemu-1.1 -M pc-1.1" on a host 
> > > > > > > kernel
> > > > > > > supporting PV EOI already have migration broken, so there's not 
> > > > > > > much we
> > > > > > > can do for them)
> > > > > > 
> > > > > > Exactly.
> > > > > > 
> > > > > > Talked to Gleb, long term I think we should rework code to make
> > > > > > it forward-compatible wrt adding new MSRs:
> > > > > > - source gets list of MSRs to be migrated from KVM and simply sends 
> > > > > > them all
> > > > > > - send all MSRS in key/value format
> > > > > > - destination gets list of MSRs to be migrated from KVM and
> > > > > >   only restores the supported ones
> > > > > 
> > > > > As far as I understand the migration code requirements/expectations, 
> > > > > if
> > > > > the origin is sending some data, it is because it is part of the
> > > > > guest-visible machine state that must be kept while migrating. Because
> > > > > of that, the destination is not allowed to drop anything it doesn't 
> > > > > know
> > > > > about.
> > > > 
> > > > 
> > > > We have a ton of code that reads in values then just
> > > > ignores them, for compat with old qemu.
> > 
> > The difference is that you ignore only the values you _know_ to be safe
> > to be ignored. You can't ignore a value simply because you don't know
> > what it means, and if it's important or not.
> > > > This will be exactly such a case:
> > > > we don't drop anything - protocol does not
> > > > support this. We read and simply do not tell kvm
> > > > about it.
> > 
> > This fits the definition of "dropping", to me.
> > 
> > > We also have tons of code that sends
> > > > useless values again for compatibility.
> > 
> > But these values should be ignored only if the receiver knows exactly
> > what it means, and knows exactly why/when it can be ignored.
> 
> Sorry I just do not understand these meta arguments.  Do you mean
> the example below? If yes let us focus on it please.

We have to have these meta arguments, because the problem is about
features/MSRs that will be introduced in the future. But okay, let's
follow with the specific argument:

> 
> > > > 
> > > > > At the same time, if it's not part of guest-visible machine
> > > > > state, it doesn't have to be sent by the migration origin.
> > 
> > It may be not directly visible, but if it's part of the machine state,
> > it affects the guest in some way. If it didn't affect the guest in any
> > way, we wouldn't even have to send it while migrating, in the first
> > place.
> > 
> > > > > 
> > > > 
> > > > False, we often send internal device state which is not
> > > > directly guest visible.
> > > 
> > > Besides, all MSRs in KVM's MSR list are actually guest visible, even if
> > > guests normally do not invoke these MSRs - they can.
> > 
> > The difference is that the machine doesn't guarantee a MSR to be
> > migrated unless the corresponding feature bit reports the MSR as
> > supported.
> > 
> > e.g. using the PV EOI MSR without having the PV EOI CPUID bit set is
> > undefined behavior. The guest shouldn't do it.
> 
> So the problematic scenario involves a guest that sets PV EOI when CPUID
> bit is off on source. Why do we even care what happens on migration?

This is the case where there is _no_ problem. But the migration
destination can't know that, how can it decide if it's safe to drop the
MSR value or not?

> 
> > On the other hand, if the PV EOI CPUID bit is set, the MSR should be
> > always kept, no matter what.
> > 
> > This means that we need to migrate the MSR only if the corresponding
> > CPUID bit is enabled.
> > 
> 
> I do not follow. Why does it mean that?
> It seems completely safe to migrate this MSR no matter what.

It is completely safe to migrate the MSR, but it is completely unsafe to
_ignore_ the incoming MSR value. That's the problem.

> 
> > > 
> > > > > On the other hand, a mode of operation that doesn't require updating
> > > > > QEMU every time there's a new bit of guest-visible state to be 
> > > > > migrated
> > > > > would be nice (just like the "-cpu host" mode, that doesn't require
> > > > > updating QEMU for every new CPU feature, is nice for some use cases). 
> > > > > I
> > > > > just don't know how to make work with the current migration protocol.
> > > > > 
> > > > 
> > > > I don't understand. What is the problem with the proposal?
> > > > What will not work with our protocol?
> > > > Can you give an example please?
> > 
> > Yes:
> > 
> > Suppose kernel 3.7 introduces KVM_FOO_MSR and CPUID_KVM_FOO.
> > 
> > Also, suppose QEMU 1.2 doesn't know anything about KVM_FOO, because it
> > was release before this feature was introduced.
> > 
> > 
> > Then you run "qemu-1.2 -M pc-1.2" on a 3.7 host kernel. qemu-1.2 can do
> > two things here:
> > 
> > 1) Not enable CPUID_KVM_FOO
> > 
> > In this case, the guest should not use KVM_FOO_MSR and the MSR does not
> > need to be migrated (the guest may try to use it, but the behavior when
> > trying to use it is undefined). Sending the MSR value when migrating
> > would be useless.
> > 
> > 
> > 2) Enable CPUID_KVM_FOO.
> > 
> > In this case, the guest may try to use the feature and write something
> > into KVM_FOO_MSR. Sending the MSR value when migrating is absolutely
> > necessary
> > 
> > ---
> > 
> > Now assume you run "qemu-1.2 -M pc-1.2" in the destination host, running
> > the 3.6 kernel (without KVM_FOO).
> > 
> > Then qemu-1.2 receives the KVM_FOO_MSR data in the migration stream. In
> > this case, qemu-1.2 simply can't decide if it's safe to drop the data
> > (and not tell KVM about it), or not.
> > If we simply send every MSR reported by the kernel, both the origin and
> > destination qemu-1.2 processes can't ever know if the MSR value is
> > important or not, because it doesn't know if it's part of the machine
> > state that has to be kept consistent.
> > We could introduce a mode of operation where _every_ MSR reported by KVM
> > is important and sent by the origin (and also where every MSR must be
> > handled by the destination, otherwise migration would fail). But this
> > new mode would break migration compatibility between two hosts running
> > the same machine-type, only because they are running different kernel
> > versions. But it may be useful for some use cases, so maybe it's
> > appropriate for a future "pc-next" machine-type (and for "-cpu host"),
> > but not for "pc-<version>".
> > 
> 
> In this example, we should migrate CPUID (don't we?).
> Destination should validate that CPUID supplied by source
> matches what it can support (doesn't it?).
> 
> If we do and it does not, it's an unrelated bug:
> CPUID changing under guest's feet.

CPUID changing under guest's feet is another problem, that we also have
to solve. But we also have the problem of migration compatibility
between different host kernels.

Note that I am not saying that migrating all MSRs is wrong. It _is_
good, as long as:
- The destination never ignores any of the incoming MSR values.
- We don't do that by default on "pc-<version>", or we defeat the
  purpose of machine-types.

I'll try to enumerate the problems I am trying to address (that are
related, but are separate problems):

- MSR not being migrated when it should:
  - Possible solution: migrate all MSRs even if qemu doesn't know what
    they are.
    - Constraint: migration destination must _never_ ignore any incoming
      MSR value, because it can't decide if it is important to the guest
      or not (with the current KVM interfaces).
    - Problem with this solution: if we do that by default on
      "pc-<version>", we break migration compatibility between hosts
      with different kernel versions.
- Changing CPUID bits under guest's feet.
  - Proposed solution: migrating CPUID bits, refuse migration if
    destination doesn't support the same bits.
    - It solves the compatibility problem for migration to a newer
      kernel, but not to an older kernel. It helps to solve part of
      the problem, but not all.
  - Alternative solution: simply make the resulting CPUID bits not be a
    function of the host kernel capabilities, but only of the qemu
    configuration (except on "-cpu host" and "-M pc-next").
- Migration compatibility/predictability:
  - See my example above: feature introduced in a newer kernel,
    migration to an older kernel.
  - The only way I see to guarantee this is to never enable unknown
    CPUID bits or MSRs. People who don't care about predictable
    migration compatibility can use "-M pc-next", then.


> 
> > > > 
> > > > 
> > > > > > Too late for 1.2?
> > > > > 
> > > > > Absolutely (in my opinion).
> > > > > 
> > > > > > 
> > > > > > > While we don't make the KVM feature-bit handling sane (with 
> > > > > > > defaults
> > > > > > > that are not blindly derived from the host kernel capabilities), 
> > > > > > > maybe
> > > > > > > the safest bet is to expect users to not migrate between hosts 
> > > > > > > running
> > > > > > > kernels with different KVM capabilities? (I am not sure which 
> > > > > > > option is
> > > > > > > better)
> > > > > > 
> > > > > > Sorry not sure what you talk about here. What has KVM feature-bit
> > > > > > handling to do with this patchset?
> > > > > 
> > > > > Everything? The whole point of this patch is to filter out the PV_EOI
> > > > > KVM feature bit.
> > > > > 
> > > > > 
> > > > > This part of the current code, specifically, is wrong:
> > > > > 
> > > > > > > >      plus_kvm_features = ~0; /* not supported bits will be 
> > > > > > > > filtered out later */
> > > > > 
> > > > > The QEMU-side list of KVM features should be whitelist-based, not
> > > > > blacklist-based (unless the user doesn't need migration, in that case 
> > > > > he
> > > > > can use "-cpu host" and get every feature blindly enabled), because 
> > > > > QEMU
> > > > > can't know if a new feature involves guest-visible state that has to 
> > > > > be
> > > > > migrated.
> > > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > Signed-off-by: Michael S. Tsirkin <address@hidden>
> > > > > > > > ---
> > > > > > > >  hw/Makefile.objs  |  2 +-
> > > > > > > >  hw/cpu_flags.c    | 32 ++++++++++++++++++++++++++++++++
> > > > > > > >  hw/cpu_flags.h    |  9 +++++++++
> > > > > > > >  hw/pc_piix.c      |  2 ++
> > > > > > > >  target-i386/cpu.c |  8 ++++++++
> > > > > > > >  5 files changed, 52 insertions(+), 1 deletion(-)
> > > > > > > >  create mode 100644 hw/cpu_flags.c
> > > > > > > >  create mode 100644 hw/cpu_flags.h
> > > > > > > > 
> > > > > > > > diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> > > > > > > > index 850b87b..3f2532a 100644
> > > > > > > > --- a/hw/Makefile.objs
> > > > > > > > +++ b/hw/Makefile.objs
> > > > > > > > @@ -1,5 +1,5 @@
> > > > > > > >  hw-obj-y = usb/ ide/
> > > > > > > > -hw-obj-y += loader.o
> > > > > > > > +hw-obj-y += loader.o cpu_flags.o
> > > > > > > >  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
> > > > > > > >  hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
> > > > > > > >  hw-obj-y += fw_cfg.o
> > > > > > > > diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..7a633c0
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/hw/cpu_flags.c
> > > > > > > > @@ -0,0 +1,32 @@
> > > > > > > > +/*
> > > > > > > > + * CPU compatibility flags.
> > > > > > > > + *
> > > > > > > > + * Copyright (c) 2012 Red Hat Inc.
> > > > > > > > + * Author: Michael S. Tsirkin.
> > > > > > > > + *
> > > > > > > > + * This program is free software; you can redistribute it 
> > > > > > > > and/or modify
> > > > > > > > + * it under the terms of the GNU General Public License as 
> > > > > > > > published by
> > > > > > > > + * the Free Software Foundation; either version 2 of the 
> > > > > > > > License, or
> > > > > > > > + * (at your option) any later version.
> > > > > > > > + *
> > > > > > > > + * This program is distributed in the hope that it will be 
> > > > > > > > useful,
> > > > > > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty 
> > > > > > > > of
> > > > > > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See 
> > > > > > > > the
> > > > > > > > + * GNU General Public License for more details.
> > > > > > > > + *
> > > > > > > > + * You should have received a copy of the GNU General Public 
> > > > > > > > License along
> > > > > > > > + * with this program; if not, see 
> > > > > > > > <http://www.gnu.org/licenses/>.
> > > > > > > > + */
> > > > > > > > +#include "hw/cpu_flags.h"
> > > > > > > > +
> > > > > > > > +static bool kvm_pv_eoi_disabled_state;
> > > > > > > > +
> > > > > > > > +void disable_kvm_pv_eoi(void)
> > > > > > > > +{
> > > > > > > > +       kvm_pv_eoi_disabled_state = true;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +bool kvm_pv_eoi_disabled(void)
> > > > > > > > +{
> > > > > > > > +       return kvm_pv_eoi_disabled_state;
> > > > > > > > +}
> > > > > > > > diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..05777b6
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/hw/cpu_flags.h
> > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > +#ifndef HW_CPU_FLAGS_H
> > > > > > > > +#define HW_CPU_FLAGS_H
> > > > > > > > +
> > > > > > > > +#include <stdbool.h>
> > > > > > > > +
> > > > > > > > +void disable_kvm_pv_eoi(void);
> > > > > > > > +bool kvm_pv_eoi_disabled(void);
> > > > > > > > +
> > > > > > > > +#endif
> > > > > > > > diff --git a/hw/pc_piix.c b/hw/pc_piix.c
> > > > > > > > index 008d42f..bdbceda 100644
> > > > > > > > --- a/hw/pc_piix.c
> > > > > > > > +++ b/hw/pc_piix.c
> > > > > > > > @@ -46,6 +46,7 @@
> > > > > > > >  #ifdef CONFIG_XEN
> > > > > > > >  #  include <xen/hvm/hvm_info_table.h>
> > > > > > > >  #endif
> > > > > > > > +#include "cpu_flags.h"
> > > > > > > >  
> > > > > > > >  #define MAX_IDE_BUS 2
> > > > > > > >  
> > > > > > > > @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {
> > > > > > > >  
> > > > > > > >  static void pc_machine_v1_1_compat(void)
> > > > > > > >  {
> > > > > > > > +    disable_kvm_pv_eoi();
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static void pc_init_pci_v1_1(ram_addr_t ram_size,
> > > > > > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > > > > > > index 120a2e3..0d02fd1 100644
> > > > > > > > --- a/target-i386/cpu.c
> > > > > > > > +++ b/target-i386/cpu.c
> > > > > > > > @@ -23,6 +23,7 @@
> > > > > > > >  
> > > > > > > >  #include "cpu.h"
> > > > > > > >  #include "kvm.h"
> > > > > > > > +#include "asm/kvm_para.h"
> > > > > > > >  
> > > > > > > >  #include "qemu-option.h"
> > > > > > > >  #include "qemu-config.h"
> > > > > > > > @@ -33,6 +34,7 @@
> > > > > > > >  #include "hyperv.h"
> > > > > > > >  
> > > > > > > >  #include "hw/hw.h"
> > > > > > > > +#include "hw/cpu_flags.h"
> > > > > > > >  
> > > > > > > >  /* feature flags taken from "Intel Processor Identification 
> > > > > > > > and the CPUID
> > > > > > > >   * Instruction" and AMD's "CPUID Specification".  In cases of 
> > > > > > > > disagreement
> > > > > > > > @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t 
> > > > > > > > *x86_cpu_def, const char *cpu_model)
> > > > > > > >  
> > > > > > > >      plus_kvm_features = ~0; /* not supported bits will be 
> > > > > > > > filtered out later */
> > > > > > > >  
> > > > > > > > +    /* Disable PV EOI for old machine types.
> > > > > > > > +     * Feature flags can still override. */
> > > > > > > > +    if (kvm_pv_eoi_disabled()) {
> > > > > > > > +        plus_kvm_features &= ~(0x1 << KVM_FEATURE_PV_EOI);
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > >      add_flagname_to_bitmaps("hypervisor", &plus_features,
> > > > > > > >          &plus_ext_features, &plus_ext2_features, 
> > > > > > > > &plus_ext3_features,
> > > > > > > >          &plus_kvm_features, &plus_svm_features);
> > > > > > > > -- 
> > > > > > > > MST
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > -- 
> > > > > > > Eduardo
> > > > > 
> > > > > -- 
> > > > > Eduardo
> > 
> > -- 
> > Eduardo

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]