qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv3] piix: fix up/down races


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCHv3] piix: fix up/down races
Date: Mon, 02 Apr 2012 12:59:54 -0600

On Mon, 2012-04-02 at 13:03 +0300, Michael S. Tsirkin wrote:
> piix acpi interface suffers from the following 2 issues:
> 
> 1.
> - delete device a
> - quickly add device b in another slot
> 
> if we do this before guest reads the down register,
> the down event is discarded and device will never
> be deleted.
> 
> 2.
> - delete device a
> - quickly reset before guest can respond
> 
> interrupt is reset and guest will never eject the device.
> 
> To fix this, we implement three changes:
> 
> 1. Do not clear up/down registers on each hotplug event.
> this ensures we don't lose events.
> 
> 2. Make existing up/down registers write 1 to clear;
> bios will now write back to these the value it read.
> 
> 3. on reset, remove all devices which have DOWN bit set
> 
> For compatibility with old guests, we also clear
> the DOWN bit on write to EJ0 for a device.
> 
> This patch also extends the documentation for the
> ACPI interface, to make the semantics explicit.
> 
> Compatibility notes: migrating guests across qemu versions
> can cause bios from one qemu realease running on another qemu
> release. The following documents the behaviour in this case:
> 
> A. new bios running on old qemu
> A bios implementing the change mentioned above will if running on an old
> qemu introduce a race: the registers were writeable there, so if a
> register changes between read and write, we'll clobber the new bits
> losing events.  As that version tends to lose events anyway, and as the
> better than complicating the interface. If someone cares enough, the fix
> can go on the stable branch.

Seems like it'd be easy to have seabios probe this on boot, ex. write
<value> to 0xae00, read 0xae00 to see if value is reflected.  If it is,
old qemu, modify dsdt to avoid writes.

> B. old bios running on new qemu
> Down bits will get cleared on eject because of 3 above.
> Up bits only get cleared on reset.
> This will cause unnecessary notifications and bus rescans,
> but they are harmless.

Doesn't windows poll these bits.  How can we say that introducing a bus
rescan every poll interval is harmless?

> 
> Signed-off-by: Michael S. Tsirkin <address@hidden>
> ---
> 
> Changes from v2:
> - Use existing register instead of adding a new one,
>   and update documentation
> Changes from v1:
> - tweaked a comment about clearing down register on reset
> - documentation update
> 
>  docs/specs/acpi_pci_hotplug.txt |   55 +++++++++++++++++++++++++++++++-----
>  hw/acpi_piix4.c                 |   58 
> ++++++++++++++++++++++++++++-----------
>  2 files changed, 89 insertions(+), 24 deletions(-)
> 
> diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> index f0f74a7..e4a9556 100644
> --- a/docs/specs/acpi_pci_hotplug.txt
> +++ b/docs/specs/acpi_pci_hotplug.txt
> @@ -7,31 +7,70 @@ describes the interface between QEMU and the ACPI BIOS.
>  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
>  -----------------------------------------
>  
> -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
>  event to ACPI BIOS, via SCI interrupt.
>  
> -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte 
> access):
> +Semantics: this event occurs each time a bit in either Pending PCI slot
> +injection notification or Pending PCI slot removal notification registers
> +changes status from 0 to 1.

Or if your bios doesn't clear it, 1 to 1.

> +
> +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte 
> access):
>  ---------------------------------------------------------------
> -Slot injection notification pending. One bit per slot.
> +Slot injection notification is pending. One bit per slot.
> +When written by Guest, this register implements write one to clear behaviour.

So we're writing the mask of the bits (slots) to be cleared?  Can
multiple slots be cleared in one write?

>  Read by ACPI BIOS GPE.1 handler to notify OS of injection
>  events.
> +Written to by ACPI BIOS GPE.1 handler after reading
> +and before notifying OS of injection events.
> +
> +Semantics: after a slot is populated by hotplug, it is immediately enabled
> +and a bit in this register is set.
> +Writes into this register are used to acknowledge the injection notification
> +and do not not affect the slot status.
> +Reset value: 0
>  
> -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
>  -----------------------------------------------------
> -Slot removal notification pending. One bit per slot.
> +Slot removal notification is pending. One bit per slot.
> +When written by Guest, this register implements write one to clear behaviour.

Same clarification as above.
 
>  Read by ACPI BIOS GPE.1 handler to notify OS of removal
>  events.
>  
> +Written to by ACPI BIOS GPE.1 handler after reading
> +and before notifying OS of removal events.
> +
> +Semantics: upon hotunplug request, a bit in this register is set
> +and the OSPM is notified about hotunplug request, but a slot remains enabled
> +until eject.
> +Writes into this register are used to acknowledge the hotunplug request,
> +and do not not affect the slot status.
> +Reset value: 0
> +
>  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
>  ----------------------------------------
>  
>  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> -Reads return 0.
> +Guests should not read this register. In practice reads return 0.

Isn't mentioning the read value creating a defacto standard?  If we're
not defining we should say reads are undefined.

> +guests should write values with exactly one bit set, selecting
> +the slot to eject.
> +In practice all bits except the lowest bit that is set are discarded.  Also 
> in

practice = standard, define it.

> +practice writing the value 0 into this register has no effect.

ditto.

> +Semantics: selected hotunpluggable slot is disabled and powered off.
> +Has no effect on non-hotunpluggable slots.
> +
> +For compatibility with old BIOS, writes to this register
> +clear bits in pending slot injection notification register.
>  
>  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
>  -----------------------------------------------
>  
> -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> -bit per slot.
> +Used by ACPI BIOS _RMV method to detect removability status
> +and report to OS. One bit per slot.
> +Guests should not write this register, in practice the value
> +written is discarded.

We should define writes as not supported.  "Should not, but ok if you
do" is not a specification.

> +Semantics: selected slots support hotplug. Contents of this register
> +do not change while guest is running.
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index 797ed24..389db16 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -287,6 +287,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
>      }
>  }
>  
> +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> +{
> +    DeviceState *qdev, *next;
> +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> +    int slot = ffs(slots) - 1;
> +
> +    /*
> +     * Clear DOWN register - this is good for a case where guest can't
> +     * write to CLR_DOWN because of VM reset, and for compatibility with
> +     * old guests which do not have CLR_DOWN.
> +     */
> +    s->pci0_status.down &= ~(1U << slot);

Should we clear "up" here too?

> +
> +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> +        PCIDevice *dev = PCI_DEVICE(qdev);
> +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> +            qdev_free(qdev);
> +        }
> +    }
> +}
> +
>  static void piix4_reset(void *opaque)
>  {
>      PIIX4PMState *s = opaque;
> @@ -302,6 +324,19 @@ static void piix4_reset(void *opaque)
>          pci_conf[0x5B] = 0x02;
>      }
>      piix4_update_hotplug(s);
> +    /*
> +     * Guest lost remove events if any.
> +     * As it's being reset it's safe to remove the device now.
> +     */
> +    while (s->pci0_status.down) {
> +        acpi_piix_eject_slot(s, s->pci0_status.down);
> +    }
> +    /*
> +     * Guest lost add events if any.
> +     * As it's being reset and will rescan the bus we can discard
> +     * past events now.
> +     */
> +    s->pci0_status.up = 0;
>  }
>  
>  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> @@ -472,10 +507,10 @@ static void pcihotplug_write(void *opaque, uint32_t 
> addr, uint32_t val)
>      struct pci_status *g = opaque;
>      switch (addr) {
>          case PCI_BASE:
> -            g->up = val;
> +            g->up &= ~val;
>              break;
>          case PCI_BASE + 4:
> -            g->down = val;
> +            g->down &= ~val;
>              break;
>     }

Ok, so it is a bit(slot) mask and can clear multiple bits.

> @@ -490,19 +525,12 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
>  
>  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
>  {
> -    BusState *bus = opaque;
> -    DeviceState *qdev, *next;
> -    int slot = ffs(val) - 1;
> +    PIIX4PMState *s = opaque;
>  
> -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> -        PCIDevice *dev = PCI_DEVICE(qdev);
> -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> -            qdev_free(qdev);
> -        }
> +    if (val) {
> +        acpi_piix_eject_slot(s, val);
>      }
>  
> -
>      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
>  }
>  
> @@ -532,8 +560,8 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
> PIIX4PMState *s)
>      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
>      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
>  
> -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
>  
>      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
>      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> @@ -567,8 +595,6 @@ static int piix4_device_hotplug(DeviceState *qdev, 
> PCIDevice *dev,
>          return 0;
>      }
>  
> -    s->pci0_status.up = 0;
> -    s->pci0_status.down = 0;
>      if (state == PCI_HOTPLUG_ENABLED) {
>          enable_device(s, slot);
>      } else {

So if we have an old bios and do an add, followed by a remove, guest
ACPI finds both "up" and "down" set for the slot and we rely on the
ordering of the AML checking up before down to keep the duct tape and
bailing wire from exploding?  Hmm, can't say I'm excited about this hack
either.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]