qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv3] piix: fix up/down races


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCHv3] piix: fix up/down races
Date: Mon, 2 Apr 2012 22:20:32 +0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Apr 02, 2012 at 12:59:54PM -0600, Alex Williamson wrote:
> On Mon, 2012-04-02 at 13:03 +0300, Michael S. Tsirkin wrote:
> > piix acpi interface suffers from the following 2 issues:
> > 
> > 1.
> > - delete device a
> > - quickly add device b in another slot
> > 
> > if we do this before guest reads the down register,
> > the down event is discarded and device will never
> > be deleted.
> > 
> > 2.
> > - delete device a
> > - quickly reset before guest can respond
> > 
> > interrupt is reset and guest will never eject the device.
> > 
> > To fix this, we implement three changes:
> > 
> > 1. Do not clear up/down registers on each hotplug event.
> > this ensures we don't lose events.
> > 
> > 2. Make existing up/down registers write 1 to clear;
> > bios will now write back to these the value it read.
> > 
> > 3. on reset, remove all devices which have DOWN bit set
> > 
> > For compatibility with old guests, we also clear
> > the DOWN bit on write to EJ0 for a device.
> > 
> > This patch also extends the documentation for the
> > ACPI interface, to make the semantics explicit.
> > 
> > Compatibility notes: migrating guests across qemu versions
> > can cause bios from one qemu realease running on another qemu
> > release. The following documents the behaviour in this case:
> > 
> > A. new bios running on old qemu
> > A bios implementing the change mentioned above will if running on an old
> > qemu introduce a race: the registers were writeable there, so if a
> > register changes between read and write, we'll clobber the new bits
> > losing events.  As that version tends to lose events anyway, and as the
> > better than complicating the interface. If someone cares enough, the fix
> > can go on the stable branch.
> 
> Seems like it'd be easy to have seabios probe this on boot, ex. write
> <value> to 0xae00, read 0xae00 to see if value is reflected.  If it is,
> old qemu, modify dsdt to avoid writes.

This won't help at all: the issue is when we migrate to
old qemu after boot.

> > B. old bios running on new qemu
> > Down bits will get cleared on eject because of 3 above.
> > Up bits only get cleared on reset.
> > This will cause unnecessary notifications and bus rescans,
> > but they are harmless.
> 
> Doesn't windows poll these bits.

No.

>  How can we say that introducing a bus
> rescan every poll interval is harmless?

It's not done every poll interval - all this does
is cause extra rescans when system interrupt is asserted.

> > 
> > Signed-off-by: Michael S. Tsirkin <address@hidden>
> > ---
> > 
> > Changes from v2:
> > - Use existing register instead of adding a new one,
> >   and update documentation
> > Changes from v1:
> > - tweaked a comment about clearing down register on reset
> > - documentation update
> > 
> >  docs/specs/acpi_pci_hotplug.txt |   55 +++++++++++++++++++++++++++++++-----
> >  hw/acpi_piix4.c                 |   58 
> > ++++++++++++++++++++++++++++-----------
> >  2 files changed, 89 insertions(+), 24 deletions(-)
> > 
> > diff --git a/docs/specs/acpi_pci_hotplug.txt 
> > b/docs/specs/acpi_pci_hotplug.txt
> > index f0f74a7..e4a9556 100644
> > --- a/docs/specs/acpi_pci_hotplug.txt
> > +++ b/docs/specs/acpi_pci_hotplug.txt
> > @@ -7,31 +7,70 @@ describes the interface between QEMU and the ACPI BIOS.
> >  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
> >  -----------------------------------------
> >  
> > -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> > +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI 
> > hotplug/hotunplug
> >  event to ACPI BIOS, via SCI interrupt.
> >  
> > -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte 
> > access):
> > +Semantics: this event occurs each time a bit in either Pending PCI slot
> > +injection notification or Pending PCI slot removal notification registers
> > +changes status from 0 to 1.
> 
> Or if your bios doesn't clear it, 1 to 1.
> 
> > +
> > +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte 
> > access):
> >  ---------------------------------------------------------------
> > -Slot injection notification pending. One bit per slot.
> > +Slot injection notification is pending. One bit per slot.
> > +When written by Guest, this register implements write one to clear 
> > behaviour.
> 
> So we're writing the mask of the bits (slots) to be cleared?  Can
> multiple slots be cleared in one write?
> 
> >  Read by ACPI BIOS GPE.1 handler to notify OS of injection
> >  events.
> > +Written to by ACPI BIOS GPE.1 handler after reading
> > +and before notifying OS of injection events.
> > +
> > +Semantics: after a slot is populated by hotplug, it is immediately enabled
> > +and a bit in this register is set.
> > +Writes into this register are used to acknowledge the injection 
> > notification
> > +and do not not affect the slot status.
> > +Reset value: 0
> >  
> > -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte 
> > access):
> >  -----------------------------------------------------
> > -Slot removal notification pending. One bit per slot.
> > +Slot removal notification is pending. One bit per slot.
> > +When written by Guest, this register implements write one to clear 
> > behaviour.
> 
> Same clarification as above.
>  
> >  Read by ACPI BIOS GPE.1 handler to notify OS of removal
> >  events.
> >  
> > +Written to by ACPI BIOS GPE.1 handler after reading
> > +and before notifying OS of removal events.
> > +
> > +Semantics: upon hotunplug request, a bit in this register is set
> > +and the OSPM is notified about hotunplug request, but a slot remains 
> > enabled
> > +until eject.
> > +Writes into this register are used to acknowledge the hotunplug request,
> > +and do not not affect the slot status.
> > +Reset value: 0
> > +
> >  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
> >  ----------------------------------------
> >  
> >  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> > -Reads return 0.
> > +Guests should not read this register. In practice reads return 0.
> 
> Isn't mentioning the read value creating a defacto standard?  If we're
> not defining we should say reads are undefined.
> 
> > +guests should write values with exactly one bit set, selecting
> > +the slot to eject.
> > +In practice all bits except the lowest bit that is set are discarded.  
> > Also in
> 
> practice = standard, define it.
> 
> > +practice writing the value 0 into this register has no effect.
> 
> ditto.
> 
> > +Semantics: selected hotunpluggable slot is disabled and powered off.
> > +Has no effect on non-hotunpluggable slots.
> > +
> > +For compatibility with old BIOS, writes to this register
> > +clear bits in pending slot injection notification register.
> >  
> >  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
> >  -----------------------------------------------
> >  
> > -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> > -bit per slot.
> > +Used by ACPI BIOS _RMV method to detect removability status
> > +and report to OS. One bit per slot.
> > +Guests should not write this register, in practice the value
> > +written is discarded.
> 
> We should define writes as not supported.  "Should not, but ok if you
> do" is not a specification.
> 
> > +Semantics: selected slots support hotplug. Contents of this register
> > +do not change while guest is running.
> > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > index 797ed24..389db16 100644
> > --- a/hw/acpi_piix4.c
> > +++ b/hw/acpi_piix4.c
> > @@ -287,6 +287,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
> >      }
> >  }
> >  
> > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> > +{
> > +    DeviceState *qdev, *next;
> > +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> > +    int slot = ffs(slots) - 1;
> > +
> > +    /*
> > +     * Clear DOWN register - this is good for a case where guest can't
> > +     * write to CLR_DOWN because of VM reset, and for compatibility with
> > +     * old guests which do not have CLR_DOWN.
> > +     */
> > +    s->pci0_status.down &= ~(1U << slot);
> 
> Should we clear "up" here too?
> 
> > +
> > +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > +        PCIDevice *dev = PCI_DEVICE(qdev);
> > +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > +            qdev_free(qdev);
> > +        }
> > +    }
> > +}
> > +
> >  static void piix4_reset(void *opaque)
> >  {
> >      PIIX4PMState *s = opaque;
> > @@ -302,6 +324,19 @@ static void piix4_reset(void *opaque)
> >          pci_conf[0x5B] = 0x02;
> >      }
> >      piix4_update_hotplug(s);
> > +    /*
> > +     * Guest lost remove events if any.
> > +     * As it's being reset it's safe to remove the device now.
> > +     */
> > +    while (s->pci0_status.down) {
> > +        acpi_piix_eject_slot(s, s->pci0_status.down);
> > +    }
> > +    /*
> > +     * Guest lost add events if any.
> > +     * As it's being reset and will rescan the bus we can discard
> > +     * past events now.
> > +     */
> > +    s->pci0_status.up = 0;
> >  }
> >  
> >  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> > @@ -472,10 +507,10 @@ static void pcihotplug_write(void *opaque, uint32_t 
> > addr, uint32_t val)
> >      struct pci_status *g = opaque;
> >      switch (addr) {
> >          case PCI_BASE:
> > -            g->up = val;
> > +            g->up &= ~val;
> >              break;
> >          case PCI_BASE + 4:
> > -            g->down = val;
> > +            g->down &= ~val;
> >              break;
> >     }
> 
> Ok, so it is a bit(slot) mask and can clear multiple bits.
> 
> > @@ -490,19 +525,12 @@ static uint32_t pciej_read(void *opaque, uint32_t 
> > addr)
> >  
> >  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
> >  {
> > -    BusState *bus = opaque;
> > -    DeviceState *qdev, *next;
> > -    int slot = ffs(val) - 1;
> > +    PIIX4PMState *s = opaque;
> >  
> > -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > -        PCIDevice *dev = PCI_DEVICE(qdev);
> > -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > -            qdev_free(qdev);
> > -        }
> > +    if (val) {
> > +        acpi_piix_eject_slot(s, val);
> >      }
> >  
> > -
> >      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
> >  }
> >  
> > @@ -532,8 +560,8 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
> > PIIX4PMState *s)
> >      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
> >      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
> >  
> > -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> > -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> > +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> > +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
> >  
> >      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
> >      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> > @@ -567,8 +595,6 @@ static int piix4_device_hotplug(DeviceState *qdev, 
> > PCIDevice *dev,
> >          return 0;
> >      }
> >  
> > -    s->pci0_status.up = 0;
> > -    s->pci0_status.down = 0;
> >      if (state == PCI_HOTPLUG_ENABLED) {
> >          enable_device(s, slot);
> >      } else {
> 
> So if we have an old bios and do an add, followed by a remove, guest
> ACPI finds both "up" and "down" set for the slot and we rely on the
> ordering of the AML checking up before down to keep the duct tape and
> bailing wire from exploding?  Hmm, can't say I'm excited about this hack
> either.  Thanks,
> 
> Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]