qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv3] piix: fix up/down races


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCHv3] piix: fix up/down races
Date: Tue, 03 Apr 2012 02:14:33 -0400 (EDT)

----- Original Message -----
> From: "Michael S. Tsirkin" <address@hidden>
> To: "Alex Williamson" <address@hidden>
> Cc: address@hidden, "Anthony Liguori" <address@hidden>, "Gerd Hoffmann" 
> <address@hidden>, "Isaku
> Yamahata" <address@hidden>, "Aurelien Jarno" <address@hidden>, "Paolo 
> Bonzini" <address@hidden>,
> address@hidden, address@hidden
> Sent: Monday, April 2, 2012 9:20:32 PM
> Subject: Re: [PATCHv3] piix: fix up/down races
> 
> On Mon, Apr 02, 2012 at 12:59:54PM -0600, Alex Williamson wrote:
> > On Mon, 2012-04-02 at 13:03 +0300, Michael S. Tsirkin wrote:
> > > piix acpi interface suffers from the following 2 issues:
> > > 
> > > 1.
> > > - delete device a
> > > - quickly add device b in another slot
> > > 
> > > if we do this before guest reads the down register,
> > > the down event is discarded and device will never
> > > be deleted.
> > > 
> > > 2.
> > > - delete device a
> > > - quickly reset before guest can respond
> > > 
> > > interrupt is reset and guest will never eject the device.
> > > 
> > > To fix this, we implement three changes:
> > > 
> > > 1. Do not clear up/down registers on each hotplug event.
> > > this ensures we don't lose events.
> > > 
> > > 2. Make existing up/down registers write 1 to clear;
> > > bios will now write back to these the value it read.
> > > 
> > > 3. on reset, remove all devices which have DOWN bit set
> > > 
> > > For compatibility with old guests, we also clear
> > > the DOWN bit on write to EJ0 for a device.
> > > 
> > > This patch also extends the documentation for the
> > > ACPI interface, to make the semantics explicit.
> > > 
> > > Compatibility notes: migrating guests across qemu versions
> > > can cause bios from one qemu realease running on another qemu
> > > release. The following documents the behaviour in this case:
> > > 
> > > A. new bios running on old qemu
> > > A bios implementing the change mentioned above will if running on
> > > an old
> > > qemu introduce a race: the registers were writeable there, so if
> > > a
> > > register changes between read and write, we'll clobber the new
> > > bits
> > > losing events.  As that version tends to lose events anyway, and
> > > as the
> > > better than complicating the interface. If someone cares enough,
> > > the fix
> > > can go on the stable branch.
> > 
> > Seems like it'd be easy to have seabios probe this on boot, ex.
> > write
> > <value> to 0xae00, read 0xae00 to see if value is reflected.  If it
> > is,
> > old qemu, modify dsdt to avoid writes.
> 
> This won't help at all: the issue is when we migrate to
> old qemu after boot.
> 
> > > B. old bios running on new qemu
> > > Down bits will get cleared on eject because of 3 above.
> > > Up bits only get cleared on reset.
> > > This will cause unnecessary notifications and bus rescans,
> > > but they are harmless.
> > 
> > Doesn't windows poll these bits.
> 
> No.
> 
> >  How can we say that introducing a bus
> > rescan every poll interval is harmless?
> 
> It's not done every poll interval - all this does
> is cause extra rescans when system interrupt is asserted.
it will do rescan only when PIIX4_PCI_HOTPLUG_STATUS in 
gpe.sts is set.
  
> 
> > > 
> > > Signed-off-by: Michael S. Tsirkin <address@hidden>
> > > ---
> > > 
> > > Changes from v2:
> > > - Use existing register instead of adding a new one,
> > >   and update documentation
> > > Changes from v1:
> > > - tweaked a comment about clearing down register on reset
> > > - documentation update
> > > 
> > >  docs/specs/acpi_pci_hotplug.txt |   55
> > >  +++++++++++++++++++++++++++++++-----
> > >  hw/acpi_piix4.c                 |   58
> > >  ++++++++++++++++++++++++++++-----------
> > >  2 files changed, 89 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/docs/specs/acpi_pci_hotplug.txt
> > > b/docs/specs/acpi_pci_hotplug.txt
> > > index f0f74a7..e4a9556 100644
> > > --- a/docs/specs/acpi_pci_hotplug.txt
> > > +++ b/docs/specs/acpi_pci_hotplug.txt
> > > @@ -7,31 +7,70 @@ describes the interface between QEMU and the
> > > ACPI BIOS.
> > >  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
> > >  -----------------------------------------
> > >  
> > > -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI
> > > hotplug/eject
> > > +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI
> > > hotplug/hotunplug
> > >  event to ACPI BIOS, via SCI interrupt.
> > >  
> > > -PCI slot injection notification pending (IO port 0xae00-0xae03,
> > > 4-byte access):
> > > +Semantics: this event occurs each time a bit in either Pending
> > > PCI slot
> > > +injection notification or Pending PCI slot removal notification
> > > registers
> > > +changes status from 0 to 1.
> > 
> > Or if your bios doesn't clear it, 1 to 1.
> > 
> > > +
> > > +Pending PCI slot injection notification (IO port 0xae00-0xae03,
> > > 4-byte access):
> > >  ---------------------------------------------------------------
> > > -Slot injection notification pending. One bit per slot.
> > > +Slot injection notification is pending. One bit per slot.
> > > +When written by Guest, this register implements write one to
> > > clear behaviour.
> > 
> > So we're writing the mask of the bits (slots) to be cleared?  Can
> > multiple slots be cleared in one write?
> > 
> > >  Read by ACPI BIOS GPE.1 handler to notify OS of injection
> > >  events.
> > > +Written to by ACPI BIOS GPE.1 handler after reading
> > > +and before notifying OS of injection events.
> > > +
> > > +Semantics: after a slot is populated by hotplug, it is
> > > immediately enabled
> > > +and a bit in this register is set.
> > > +Writes into this register are used to acknowledge the injection
> > > notification
> > > +and do not not affect the slot status.
> > > +Reset value: 0
> > >  
> > > -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte
> > > access):
> > > +Pending PCI slot removal notification (IO port 0xae04-0xae07,
> > > 4-byte access):
> > >  -----------------------------------------------------
> > > -Slot removal notification pending. One bit per slot.
> > > +Slot removal notification is pending. One bit per slot.
> > > +When written by Guest, this register implements write one to
> > > clear behaviour.
> > 
> > Same clarification as above.
> >  
> > >  Read by ACPI BIOS GPE.1 handler to notify OS of removal
> > >  events.
> > >  
> > > +Written to by ACPI BIOS GPE.1 handler after reading
> > > +and before notifying OS of removal events.
> > > +
> > > +Semantics: upon hotunplug request, a bit in this register is set
> > > +and the OSPM is notified about hotunplug request, but a slot
> > > remains enabled
> > > +until eject.
> > > +Writes into this register are used to acknowledge the hotunplug
> > > request,
> > > +and do not not affect the slot status.
> > > +Reset value: 0
> > > +
> > >  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
> > >  ----------------------------------------
> > >  
> > >  Used by ACPI BIOS _EJ0 method to request device removal. One bit
> > >  per slot.
> > > -Reads return 0.
> > > +Guests should not read this register. In practice reads return
> > > 0.
> > 
> > Isn't mentioning the read value creating a defacto standard?  If
> > we're
> > not defining we should say reads are undefined.
> > 
> > > +guests should write values with exactly one bit set, selecting
> > > +the slot to eject.
> > > +In practice all bits except the lowest bit that is set are
> > > discarded.  Also in
> > 
> > practice = standard, define it.
> > 
> > > +practice writing the value 0 into this register has no effect.
> > 
> > ditto.
> > 
> > > +Semantics: selected hotunpluggable slot is disabled and powered
> > > off.
> > > +Has no effect on non-hotunpluggable slots.
> > > +
> > > +For compatibility with old BIOS, writes to this register
> > > +clear bits in pending slot injection notification register.
> > >  
> > >  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
> > >  -----------------------------------------------
> > >  
> > > -Used by ACPI BIOS _RMV method to indicate removability status to
> > > OS. One
> > > -bit per slot.
> > > +Used by ACPI BIOS _RMV method to detect removability status
> > > +and report to OS. One bit per slot.
> > > +Guests should not write this register, in practice the value
> > > +written is discarded.
> > 
> > We should define writes as not supported.  "Should not, but ok if
> > you
> > do" is not a specification.
> > 
> > > +Semantics: selected slots support hotplug. Contents of this
> > > register
> > > +do not change while guest is running.
> > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > > index 797ed24..389db16 100644
> > > --- a/hw/acpi_piix4.c
> > > +++ b/hw/acpi_piix4.c
> > > @@ -287,6 +287,28 @@ static void
> > > piix4_update_hotplug(PIIX4PMState *s)
> > >      }
> > >  }
> > >  
> > > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned
> > > slots)
> > > +{
> > > +    DeviceState *qdev, *next;
> > > +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> > > +    int slot = ffs(slots) - 1;
> > > +
> > > +    /*
> > > +     * Clear DOWN register - this is good for a case where guest
> > > can't
> > > +     * write to CLR_DOWN because of VM reset, and for
> > > compatibility with
> > > +     * old guests which do not have CLR_DOWN.
> > > +     */
> > > +    s->pci0_status.down &= ~(1U << slot);
> > 
> > Should we clear "up" here too?
> > 
> > > +
> > > +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > > +        PCIDevice *dev = PCI_DEVICE(qdev);
> > > +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > > +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > > +            qdev_free(qdev);
> > > +        }
> > > +    }
> > > +}
> > > +
> > >  static void piix4_reset(void *opaque)
> > >  {
> > >      PIIX4PMState *s = opaque;
> > > @@ -302,6 +324,19 @@ static void piix4_reset(void *opaque)
> > >          pci_conf[0x5B] = 0x02;
> > >      }
> > >      piix4_update_hotplug(s);
> > > +    /*
> > > +     * Guest lost remove events if any.
> > > +     * As it's being reset it's safe to remove the device now.
> > > +     */
> > > +    while (s->pci0_status.down) {
> > > +        acpi_piix_eject_slot(s, s->pci0_status.down);
> > > +    }
> > > +    /*
> > > +     * Guest lost add events if any.
> > > +     * As it's being reset and will rescan the bus we can
> > > discard
> > > +     * past events now.
> > > +     */
> > > +    s->pci0_status.up = 0;
> > >  }
> > >  
> > >  static void piix4_powerdown(void *opaque, int irq, int
> > >  power_failing)
> > > @@ -472,10 +507,10 @@ static void pcihotplug_write(void *opaque,
> > > uint32_t addr, uint32_t val)
> > >      struct pci_status *g = opaque;
> > >      switch (addr) {
> > >          case PCI_BASE:
> > > -            g->up = val;
> > > +            g->up &= ~val;
> > >              break;
> > >          case PCI_BASE + 4:
> > > -            g->down = val;
> > > +            g->down &= ~val;
> > >              break;
> > >     }
> > 
> > Ok, so it is a bit(slot) mask and can clear multiple bits.
> > 
> > > @@ -490,19 +525,12 @@ static uint32_t pciej_read(void *opaque,
> > > uint32_t addr)
> > >  
> > >  static void pciej_write(void *opaque, uint32_t addr, uint32_t
> > >  val)
> > >  {
> > > -    BusState *bus = opaque;
> > > -    DeviceState *qdev, *next;
> > > -    int slot = ffs(val) - 1;
> > > +    PIIX4PMState *s = opaque;
> > >  
> > > -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > > -        PCIDevice *dev = PCI_DEVICE(qdev);
> > > -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > > -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > > -            qdev_free(qdev);
> > > -        }
> > > +    if (val) {
> > > +        acpi_piix_eject_slot(s, val);
> > >      }
> > >  
> > > -
> > >      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
> > >  }
> > >  
> > > @@ -532,8 +560,8 @@ static void
> > > piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
> > >      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write,
> > >      pci0_status);
> > >      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read,
> > >      pci0_status);
> > >  
> > > -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> > > -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> > > +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> > > +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
> > >  
> > >      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
> > >      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> > > @@ -567,8 +595,6 @@ static int piix4_device_hotplug(DeviceState
> > > *qdev, PCIDevice *dev,
> > >          return 0;
> > >      }
> > >  
> > > -    s->pci0_status.up = 0;
> > > -    s->pci0_status.down = 0;
> > >      if (state == PCI_HOTPLUG_ENABLED) {
> > >          enable_device(s, slot);
> > >      } else {
> > 
> > So if we have an old bios and do an add, followed by a remove,
> > guest
> > ACPI finds both "up" and "down" set for the slot and we rely on the
> > ordering of the AML checking up before down to keep the duct tape
> > and
> > bailing wire from exploding?  Hmm, can't say I'm excited about this
> > hack
> > either.  Thanks,
> > 
> > Alex
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]