qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 for-2.9 0/3] q35: add negotiable broadcast SM


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH v3 for-2.9 0/3] q35: add negotiable broadcast SMI
Date: Thu, 24 Nov 2016 15:55:40 +0100

On Thu, 24 Nov 2016 01:01:58 +0100
Laszlo Ersek <address@hidden> wrote:

> CC Jordan & Mike
> 
> On 11/23/16 23:35, Paolo Bonzini wrote:
> > 
> > 
> > On 18/11/2016 11:36, Laszlo Ersek wrote:
> >> This is v3 of the series, with updates based on the v2 discussion:
> >> <http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02687.html>.
> >>
> >> I've added feature negotiation via the APM_STS ("scratchpad") register.
> >> A new spec file called "docs/specs/q35-apm-sts.txt" is included.
> >>
> >> Tested with new OVMF patches (about to send out those as well).
> >> Regression tested with SeaBIOS (beyond simple functional tests with
> >> maximum SeaBIOS logging enabled, I used gdb to step through the new
> >> ich9_apm_status_changed() callback to see if it was behaving compatibly
> >> with SeaBIOS).
> >>
> >> The series was developed and tested on top of v2.7.0, because v2.8.0-rc0
> >> crashes very quickly for me when running OVMF:
> >>
> >>   kvm_io_ioeventfd_add: error adding ioeventfd: File exists
> >>
> >> It is my understanding that there are patches on the list for this:
> >>
> >>   [Qemu-devel] [PATCH v2 for-2.8 0/3] virtio fixes
> >>
> >> Anyway, the series rebases to v2.8.0-rc0 without as much as context
> >> differences.
> > 
> > Hi Laszlo,
> > 
> > sorry for the slightly delayed reply.
> > 
> > First of all, I'm wondering if we would be better off adding a new port
> > 0xB1 that is QEMU-specific, instead of reusing 0xB3.
> 
> Sure, I can look into that, if we agree that's the best way to proceed,
> for now. (Although I'm not really happy about the new memory region
> stuff it would require. :()
> 
> I CC'd Kevin to learn if he foresaw other uses for the APM_STS register
> in SeaBIOS.
> 
> BTW, what happens in QEMU if the guest reads an unimplemented port?
> Hm... unassigned_io_write() seems to be a no-op, and
> unassigned_io_read() returns all-bits-one. This means that for a new
> port, the negotiation protocol / values have to be reworked.
> 
> Port 0xB1 is occupied by ICH9 according to the spec:
> 
>   Table 9-2. Fixed I/O Ranges Decoded by Intel ® ICH9 (Sheet 2 of 2)
> 
>   I/O
>   Address  Read Target           Write Target          Internal Unit
>   -------  --------------------  --------------------  -------------
>   B0h–B1h  Interrupt Controller  Interrupt Controller  Interrupt
> 
> I wonder if we care -- after all, APM_STS (0xB3) is documented not to
> have any hardware effects ("scratchpad register").
+1 in favor of using scratch register as Laszlo suggests.


> > Second, I now remembered the reason why I was against broadcast SMI.
> > The reason is that it breaks hot-plug.
> 
> How does it break hot-plug? After reading your explanation below: is it
> that the broadcast SMI (possibly raised by the OS directly) would get to
> the new VCPU before the firmware relocated its SMBASE?
> 
> > 
> > On hot-plug, the firmware (if it wants to use SMI for anything secure)
> > must relocate the SMBASE of the newly-hotplugged CPU before the OS has
> > any chance to put its fangs on it.  This is because the OS can send
> > direct SMIs and is in control of the area at 0x30000.  So OVMF is
> > already broken with respect to hotplug,
> 
> Yes. We theorized that there could be further edk2 core modules that
> hadn't been open sourced yet, necessary for handling VCPU hotplug.
> 
> > but I am not yet sure if these
> > patches would break it further.
> 
> Hard to say without seeing those modules.
> 
> I will speculate: assuming that the non-public modules fit together with
> the public modules in some way, I expect the broadcast SMI shouldn't
> break them. The reason is that the broadcast SMI / traditional delivery
> are the default method in UefiCpuPkg, and in practice they work better
> (more reliably) with the rest of the edk2 infrastructure, in my
> experience, than the relaxed sync method.
> 
> In his review today, Jordan wrote
> <https://lists.01.org/pipermail/edk2-devel/2016-November/005088.html>,
> 
> > I'm glad we'll be using a mechanism that broadcasts to all the
> > processors like the real hardware. It is a bit unfortunate that it
> > doesn't go through the b2 port for it.
> 
> If broadcast is how real hardware does it (even by default!,
> apparently), I expect those as-yet unreleased, hotplug-handling modules
> in edk2 should be able to cope with broadcast.
> 
> > The full solution is to follow a protocol similar to what real hardware
> > does.
> 
> Real hardware seems to use broadcast, according to the above...
> 
> On 11/23/16 23:35, Paolo Bonzini wrote:
> 
> > On hot-plug, before the new CPU starts running, the DSDT should
> > generate an SMI (with a well-known value written to 0xB2).
> 
> I'm not sure I understand right. If it is the DSDT that writes to 0xB2,
> that's just another way to say, "the firmware vendor asks the operating
> system to write to 0xB2". If the malicious OS is smart enough, it can
> realize (from the hardware signal to run the ACPI GPE handler, IIRC)
> that a new VCPU is available, and simply not trigger the SMI.
> 
> > The handler
> > then:
> > 
> > 1) waits for SMM rendezvous
> > 
> > 2) unparks the hotplugged VCPU.  Until the VCPU is unparked, it doesn't
> > react to either INITs or of course SIPIs (this would need to be
> > implemented separately for both TCG and KVM! but only in QEMU, not in
> > the kernel).
> 
> Okay, this does plug the hole I sketched out above. This logic (the
> QEMU-specific unparking) can be done in another platform API in OVMF I
> guess (like those in SmmCpuFeaturesLib), but I wonder if we have to
> provide the infrastructure in platform code up to the separate SMI
> command handler. I think it again depends on those unreleased modules.
I don't really like parked CPU idea and related modifications to
CPU hotplug MMIO interface.

How about an alternative approach:

1) on CPU hotplug QEMU generates SMI (if SMIs are enabled)
   it doesn't matter if it's a directed (to being hotplugged CPU) or broadcast 
SMI)
   as hotplugged CPU is in reset state and won't handle SMI until it receives 
SIPI
   (see SDM: 34.2 SYSTEM MANAGEMENT INTERRUPT: NOTES)

2) waits for SMM rendezvous of all known CPUs (i.e except of a hotplugged one)

3) once in rendezvous, a master CPU restores memory at SMBASE to
   a known-good state and sends directed INIT/SIPI to the hotplugged CPU

4) the hotplugged CPU relocates SMBASE as pending SMI is processed before SIPI

5) SIPI handler registers hotplugged CPU in UefiCpuPkg/MpInitLib runtime data 
structures
   and switches CPU into x2APIC mode if necessary
   (maybe this could be done from SMI handler (safer) and SIPI handler could be 
just NOP)

6) control goes to OS that gets and processes GPE interrupt without as chance 
   to do dirty tricks on default SMBASE memory

step 2:
  process could be optimized if SMRR range would cover default SMBASE
  then only master CPU and the hotplugged one have to switch into SMM mode,
  this whole machine doesn't have to be stopped in SMM rendezvous.


BTW:
I don't see how broadcast SMI could be used reliably at initial boot 
as all present APs would race toward the same SMBASE when broadcast
SIPI is sent so it's either directed SMIs or directed SIPIs, I'm not
sure what edk2 actually does.

 
> > 3) relocates SMBASE
> > 
> > 4) records the presence of the new VCPU's APIC id for subsequent SMIs.
> > 
> > 
> > The other important things are:
> > 
> > * new CPU-hotplug controller (docs/specs/acpi_cpu_hotplug.txt)
> > interfaces.  I think this would only need a new bit in the write
> > register at 0x4 ("CPU device control fields").
> > 
> > The 0x0-0x3 write register, currently reserved for reading, might
> > become read/write for easier communication with the SMI handler.  The
> > SMI handler would write 1 to the new bit in order to unpark the VCPU.
> > 
> > * how to enable this.  I think it would need a new SMM feature, just
> > like those that you are adding here, in order to make it negotiable.  If
> > it is not negotiated, but broadcast SMI is negotiated, you'd need to do
> > something such as not allowing CPU-hotplug.  (This is the only part that
> > I think is required for 2.9).
> 
> My hope is that we can work on this incrementally (or at least that we
> don't forfeit our chance at SMM + VCPU hotplug) by going down the
> broadcast SMI route first. Based on what Jordan wrote, this hope should
> not be unfounded.
> 
> Also, the SMM stack (across components: core edk2 code, OVMF platform
> code, QEMU and KVM) is now more than a year old, and we still have
> stability problems (hence this series). I'd like to stabilize the base
> features before getting wild with hotplug.
> 
> (The fact that OVMF lags behind SeaBIOS in a number of features is just
> business as usual.)
> 
> So, I'd vastly prefer if I needed to extend this series only with a way
> to prevent VCPUs from being hotplugged. I think negotiating SMI
> broadcast should be good enough for that, as a hook point, given that it
> runs in the firmware before the OS gets control (on S3 resume too).
> 
> I guess a global variable (or a board-level query function) that would
> cause pc_query_hotpluggable_cpus() to return an empty list would be
> frowned upon; what's the recommended method?
> 
> There is DeviceClass.hotpluggable, but that's not a dynamic property.
> 
> > 
> > In order to trigger the SMI, the
> > 
> >                  ifctx = aml_if(aml_equal(ins_evt, one));
> >                  {
> >                      aml_append(ifctx,
> >                          aml_call2(CPU_NOTIFY_METHOD, cpu_data, dev_chk));
> >                      aml_append(ifctx, aml_store(one, ins_evt));
> >                      aml_append(ifctx, aml_store(one, has_event));
> >                  }
> > 
> > would be replaced by something like this pseudo-AML:
> > 
> >             Store(And(smm_features, 2), Local1)
> >             ...
> >             Store(next_cpu_cmd, cpu_cmd)
> >             If (Equal(ins_evt, One)) {
> >                     If (Greater(Local1, Zero)) {
> >                             Store(CPU_HP_APM_CMD, apm_cmd)
> >                     }
> >                     CPU_NOTIFY_METHOD(cpu_data, dev_chk)
> >                     Store(One, ins_evt)
> >                     Store(One, has_event)
> >             }
> > 
> > 
> > Of course all this is for OVMF only.  SeaBIOS doesn't need to do
> > anything of this, because it actually likes to have its SMIs only on the
> > current CPU (and it had better be the BSP, since SMBASE is not relocated
> > elsewhere!).
> > 
> > Igor, any thoughts?
> > 
> > I understand that this is quite huge in both OVMF and QEMU, but we've
> > only been delaying it and we knew about it. :(
> 
> I agree about being aware of lack of VCPU hotplug support wrt. SMM.
> 
> I disagree about delaying it. I've been aware of problems with the base
> SMM features (of differing importance), for example
> 
> - that S3 resume with the Ia32 build of OVMF was almost hit-or-miss,
> 
> - or that the heavy use of UEFI variable services during Windows 8 boot
> caused user-visible choppiness in the rotating animation (because of how
> our SMIs worked) -- I think I even mentioned this to you.
> 
> Since the stalemate in
> 
>   http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html
> 
> from last November, I've never stopped disliking the lack of broadcast
> SMIs, but I couldn't do anything about it, despite it leaving the base
> feature set unreliable.
> 
> Recently, the open source edk2 SMM stack has acquired a bunch of new
> code (you're aware), which exacerbate the instabilities (as I've
> documented on the edk2-devel list in excruciating detail). I was
> overjoyed when you finally suggested (!) to add broadcast SMI, getting
> the discussion un-stuck, because (a) it miraculously fixes everything,
> and (b) honestly, I see no other way from keeping the SMM stack from
> falling apart otherwise, *without* VCPU hotplug in the picture.
> 
> For the last few weeks, I've been working day and night (I'm not
> exaggerating) on testing & reviewing edk2 patches related to SMM or even
> just multiprocessing, catching errors, reporting them, and even
> debugging and fixing them myself. (See for example commits
> 
>   00650c531a90 UefiCpuPkg/MpInitLib/X64/MpFuncs.nasm: fix fatal typo
>   4af3ae146379 UefiCpuPkg/LocalApicLib: fix feature test for Extended
>                Topology CPUID leaf
>   1cbd83308989 UefiCpuPkg/MpInitLib: fix feature test for Extended
>                Topology CPUID leaf
> 
> ). Putting out fires more or less as a norm.
> 
> Furthermore, I haven't been pretending that VCPU hotplug "doesn't
> exist", see
> 
>   https://lists.01.org/pipermail/edk2-devel/2016-November/005131.html
> 
> for example, which I posted independently, ~8 hours ago.
> 
> So, no. I reject the idea that we've been procrastinating SMM enabled
> VCPU hotplug. We can't build the second floor while the foundation is
> shaking. And then we need to ask Intel to release more code as open source.
> 
> Thanks
> Laszlo
> 
> > 
> > Paolo
> > 
> >> Cc: "Kevin O'Connor" <address@hidden>
> >> Cc: "Michael S. Tsirkin" <address@hidden>
> >> Cc: Gerd Hoffmann <address@hidden>
> >> Cc: Paolo Bonzini <address@hidden>
> >>
> >> Thanks
> >> Laszlo
> >>
> >> Laszlo Ersek (3):
> >>   hw/isa/apm: introduce callback for APM_STS_IOPORT writes
> >>   hw/isa/lpc_ich9: add SMI feature negotiation via APM_STS
> >>   hw/isa/lpc_ich9: ICH9_APM_STS_F_BROADCAST_SMI: inject SMI on all VCPUs
> >>
> >>  docs/specs/q35-apm-sts.txt | 80 
> >> ++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/i386/ich9.h     |  9 ++++++
> >>  include/hw/isa/apm.h       |  9 +++---
> >>  hw/acpi/piix4.c            |  2 +-
> >>  hw/isa/apm.c               | 15 ++++++---
> >>  hw/isa/lpc_ich9.c          | 64 +++++++++++++++++++++++++++++++++++--
> >>  hw/isa/vt82c686.c          |  2 +-
> >>  7 files changed, 168 insertions(+), 13 deletions(-)
> >>  create mode 100644 docs/specs/q35-apm-sts.txt
> >>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]