qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/3] VCPU hotplug support


From: Jan Kiszka
Subject: Re: [Qemu-devel] [PATCH 2/3] VCPU hotplug support
Date: Mon, 23 Jan 2012 18:55:18 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2012-01-23 17:29, Igor Mammedov wrote:
> On 01/17/2012 03:17 PM, Jan Kiszka wrote:
>> On 2012-01-17 14:17, Igor Mammedov wrote:
>>> Rebase of the missing bits from qemu-kvm for vcpu hotplug
>>
>> Description, please. Please try to split up, at least into PIIX4
>> preparations and "the rest". Maybe also a patch for a generic CPU
>> hotplug infrastructure.
>>
>>>
>>> Signed-off-by: Igor Mammedov<address@hidden>
>>> ---
>>>   Makefile.objs     |    2 +-
>>>   Makefile.target   |    2 +-
>>>   hw/acpi_piix4.c   |   83 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++--
>>>   hw/pc.c           |   21 +++++++++++++-
>>>   hw/pc_piix.c      |    4 ++
>>>   sysemu.h          |    2 +
>>>   target-i386/cpu.h |    1 +
>>>   7 files changed, 108 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 45df666..2a8ccc1 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -212,7 +212,7 @@ hw-obj-$(CONFIG_USB_UHCI) += usb-uhci.o
>>>   hw-obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>>>   hw-obj-$(CONFIG_USB_EHCI) += usb-ehci.o
>>>   hw-obj-$(CONFIG_FDC) += fdc.o
>>> -hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o icc_bus.o
>>> +hw-obj-$(CONFIG_ACPI) += acpi.o icc_bus.o
>>>   hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
>>>   hw-obj-$(CONFIG_DMA) += dma.o
>>>   hw-obj-$(CONFIG_HPET) += hpet.o
>>> diff --git a/Makefile.target b/Makefile.target
>>> index 06d79b8..4865dde 100644
>>> --- a/Makefile.target
>>> +++ b/Makefile.target
>>> @@ -226,7 +226,7 @@ obj-y += device-hotplug.o
>>>   # Hardware support
>>>   obj-i386-y += vga.o
>>>   obj-i386-y += mc146818rtc.o pc.o
>>> -obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o
>>> +obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o acpi_piix4.o
>>
>> That's a qemu-kvm hack, see below for the discussion. No-go for upstream
>> - likely.
>>
>>>   obj-i386-y += vmport.o
>>>   obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
>>>   obj-i386-y += debugcon.o multiboot.o
>>> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
>>> index bdc55a1..f1cdcc1 100644
>>> --- a/hw/acpi_piix4.c
>>> +++ b/hw/acpi_piix4.c
>>> @@ -39,13 +39,19 @@
>>>   #define ACPI_DBG_IO_ADDR  0xb044
>>>
>>>   #define GPE_BASE 0xafe0
>>> +#define PROC_BASE 0xaf00
>>>   #define GPE_LEN 4
>>>   #define PCI_BASE 0xae00
>>>   #define PCI_EJ_BASE 0xae08
>>>   #define PCI_RMV_BASE 0xae0c
>>>
>>> +#define PIIX4_CPU_HOTPLUG_STATUS 4
>>>   #define PIIX4_PCI_HOTPLUG_STATUS 2
>>>
>>> +struct gpe_regs {
>>> +    uint8_t cpus_sts[32];
>>> +};
>>> +
>>>   struct pci_status {
>>>       uint32_t up;
>>>       uint32_t down;
>>> @@ -71,6 +77,7 @@ typedef struct PIIX4PMState {
>>>
>>>       /* for pci hotplug */
>>>       ACPIGPE gpe;
>>> +    struct gpe_regs gpe_cpu;
>>>       struct pci_status pci0_status;
>>>       uint32_t pci0_hotplug_enable;
>>>   } PIIX4PMState;
>>> @@ -90,9 +97,11 @@ static void pm_update_sci(PIIX4PMState *s)
>>>                      ACPI_BITMASK_POWER_BUTTON_ENABLE |
>>>                      ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
>>>                      ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
>>> -        (((s->gpe.sts[0]&  s->gpe.en[0])&  PIIX4_PCI_HOTPLUG_STATUS) != 0);
>>> +        (((s->gpe.sts[0]&  s->gpe.en[0])&
>>> +         (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_CPU_HOTPLUG_STATUS)) != 0);
>>>
>>>       qemu_set_irq(s->irq, sci_level);
>>> +
>>>       /* schedule a timer interruption if needed */
>>>       acpi_pm_tmr_update(&s->tmr, (s->pm1a.en&  ACPI_BITMASK_TIMER_ENABLE)&&
>>>                          !(pmsts&  ACPI_BITMASK_TIMER_STATUS));
>>> @@ -219,10 +228,9 @@ static int vmstate_acpi_post_load(void *opaque, int 
>>> version_id)
>>>    {                                                                   \
>>>        .name       = (stringify(_field)),                              \
>>>        .version_id = 0,                                                \
>>> -     .num        = GPE_LEN,                                          \
>>>        .info       =&vmstate_info_uint16,                             \
>>>        .size       = sizeof(uint16_t),                                 \
>>> -     .flags      = VMS_ARRAY | VMS_POINTER,                          \
>>> +     .flags      = VMS_SINGLE | VMS_POINTER,                         \
>>
>> Does this make the vmstate incompatible to the current version?
> I suspect it makes it incompatible, but not sure what to do about it.
> According to b2e4a396f7 in qemu-kvm it fixes migration bug. It probably
> should be an independed patch not related to hotplug since qemu.git has
> commit 23910d3f6 that introduced the code there and it applies to gpe
> struct in general.

So this hunks converts an array back to a single uint16_t entry - unless
I'm now totally confused about VMSTATE_GPE_ARRAY. CC'ing Juan in the
hope he can parse it for us. That might be a bug in upstream then. But
it should definitely be address in a separate patch.

> 
>>
>>>        .offset     = vmstate_offset_pointer(_state, _field, uint8_t),  \
>>>    }
>>>
>>> @@ -329,11 +337,16 @@ static void piix4_pm_machine_ready(Notifier *n, void 
>>> *opaque)
>>>
>>>   }
>>>
>>> +static PIIX4PMState *global_piix4_pm_state; /* cpu hotadd */
>>> +
>>>   static int piix4_pm_initfn(PCIDevice *dev)
>>>   {
>>>       PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, dev);
>>>       uint8_t *pci_conf;
>>>
>>> +    /* for cpu hotadd */
>>> +    global_piix4_pm_state = s;
>>> +
>>>       pci_conf = s->dev.config;
>>>       pci_conf[0x06] = 0x80;
>>>       pci_conf[0x07] = 0x02;
>>> @@ -425,7 +438,16 @@ device_init(piix4_pm_register);
>>>   static uint32_t gpe_readb(void *opaque, uint32_t addr)
>>>   {
>>>       PIIX4PMState *s = opaque;
>>> -    uint32_t val = acpi_gpe_ioport_readb(&s->gpe, addr);
>>> +    uint32_t val = 0;
>>> +    struct gpe_regs *g =&s->gpe_cpu;
>>> +
>>> +    switch (addr) {
>>> +    case PROC_BASE ... PROC_BASE+31:
>>> +        val = g->cpus_sts[addr - PROC_BASE];
>>> +        break;
>>> +    default:
>>> +        val = acpi_gpe_ioport_readb(&s->gpe, addr);
>>> +    }
>>>
>>>       PIIX4_DPRINTF("gpe read %x == %x\n", addr, val);
>>>       return val;
>>> @@ -519,11 +541,20 @@ static int piix4_device_hotplug(DeviceState *qdev, 
>>> PCIDevice *dev,
>>>   static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
>>>   {
>>>       struct pci_status *pci0_status =&s->pci0_status;
>>> +    int i = 0, cpus = smp_cpus;
>>> +
>>> +    while (cpus>  0) {
>>> +        s->gpe_cpu.cpus_sts[i++] = (cpus<  8) ? (1<<  cpus) - 1 : 0xff;
>>> +        cpus -= 8;
>>> +    }
>>>
>>>       register_ioport_write(GPE_BASE, GPE_LEN, 1, gpe_writeb, s);
>>>       register_ioport_read(GPE_BASE, GPE_LEN, 1,  gpe_readb, s);
>>>       acpi_gpe_blk(&s->gpe, GPE_BASE);
>>>
>>> +    register_ioport_write(PROC_BASE, 32, 1, gpe_writeb, s);
>>> +    register_ioport_read(PROC_BASE, 32, 1,  gpe_readb, s);
>>> +
>>>       register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
>>>       register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
>>>
>>> @@ -536,6 +567,50 @@ static void piix4_acpi_system_hot_add_init(PCIBus 
>>> *bus, PIIX4PMState *s)
>>>       pci_bus_hotplug(bus, piix4_device_hotplug,&s->dev.qdev);
>>>   }
>>>
>>> +extern const char *global_cpu_model;
>>> +
>>> +#ifdef TARGET_I386
>>
>> Why only TARGET_I386? If the PIIX4 is used on other targets, the
> 
> May be there is no sence in spending time on abstacting PIIX4 that
> will be used only by x86 target. Is there an even remote chance that PIIX4
> could/will be used with other targets?

It's used on MIPS. At least it is built for that target.

> 
> 
>> infrastructure should be prepared to enable hotplugging for them as well
>> (at a later point). pc_new_cpu is a bad name then. And APIC fiddling
>> should be pushed into the arch-specific new-cpu handler.
> 
> I've started to implement 'new-cpu' as you suggested but could you clarify 
> what
> you've meant under "APIC fiddling"?

I meant cpuid_apic_id initialization.

> Is there an arch specific place for pc like hw to put this in (pc_piix.c 
> perhaps)?

Generic PC hardware bits should go to pc.c.

> 
>> Also note that target dependencies are a no-go for hwlib compilations
>> which is clearly preferrable today.
> I guess that target specific deps are here because of the code below
> needed access to CPUState of specific target, with proper abstraction
> this deps could be avoided.

Yep, that would be good.

> 
>>
>>> +static void enable_processor(PIIX4PMState *s, int cpu)
>>> +{
>>> +    struct gpe_regs *g =&s->gpe_cpu;
>>> +    ACPIGPE *gpe =&s->gpe;
>>> +
>>> +    *gpe->sts = *gpe->sts | PIIX4_CPU_HOTPLUG_STATUS;
>>> +    g->cpus_sts[cpu/8] |= (1<<  (cpu%8));
>>
>> "cpu / 8", "cpu % 8", please. Here and below. checkpatch doesn't complain?
> 
> checkpatch was happy, I'll fix it.
> 
>>
>>> +}
>>> +
>>> +static void disable_processor(PIIX4PMState *s, int cpu)
>>> +{
>>> +    struct gpe_regs *g =&s->gpe_cpu;
>>> +    ACPIGPE *gpe =&s->gpe;
>>> +
>>> +    *gpe->sts = *gpe->sts | PIIX4_CPU_HOTPLUG_STATUS;
>>> +    g->cpus_sts[cpu/8]&= ~(1<<  (cpu%8));
>>> +}
>>> +
>>> +void qemu_system_cpu_hot_add(int cpu, int state)
>>> +{
>>> +    CPUState *env;
>>> +    PIIX4PMState *s = global_piix4_pm_state;
>>> +
>>> +    if (state&&  !qemu_get_cpu(cpu)) {
>>> +        env = pc_new_cpu(global_cpu_model);
>>> +        if (!env) {
>>> +            fprintf(stderr, "cpu %d creation failed\n", cpu);
>>> +            return;
>>> +        }
>>> +        env->cpuid_apic_id = cpu;
>>
>> See comment above about proper target abstraction.
>>
> 
> Let suppose that we have on command line option "-smp 1,maxcpus=3"
> and then call
>    qemu_system_cpu_hot_add( 2, 1)
>    qemu_system_cpu_hot_add( 2, 1)
> and we end up with
>    [cpu_index = 1, cpuid_apic_id = 2],
>    [cpu_index = 2, cpuid_apic_id = 2]
> qemu_get_cpu(cpu) uses cpu_index field to lookup vcpu and
> cpu_init -> cpu_x86_init -> cpu_exec_init doesn't have an ability
> to create vcpu with specific cpu_index and numbers them according
> to position in vcpu list. So we end up with 2 new vcpus with the
> same cpuid_apic_id. That for sure might confuse guest since cpuid
> for last 2 cpus will return the same value. And I'm not sure what
> will happen if we call
>    qemu_system_cpu_hot_add( 2, 1)
>    qemu_system_cpu_hot_add( 1, 1)
> and end up with
>    [cpu_index = 1, cpuid_apic_id = 2],
>    [cpu_index = 2, cpuid_apic_id = 1]
> I don't know what kind of trouble this could cause.
> 
> It seams that "env->cpuid_apic_id = cpu" is pointless especcialy
> taking in account that in cpu_x86_init cpuid_apic_id is initialized
> by cpu_index.
> What we gain in having cpuid_apic_id that actually duplicate cpu_index?
> May be there is sence to get rid of cpuid_apic_id?

cpu_index is for internal counting (I think to remember that,
cpuid_apic_id is the ID exposed to the guest. During CPU hotplug, you
can control this by virtually plugging the CPU in a specific slot. So we
need to pass this ID down the init chain - just not set it in generic code.

> 
> Another question is about how hot-plug/unplug should be designed:
> 1st approach:
>     With the current code we can't create vcpu with specific index.

Forget about index, the apic_id is important to control. And that could
become something like -cpu XXX,apid_id=N. Of course, collisions need to
be detected and rejected.

>     But we can implement xen like approach, where hot-plug command says
>     which amount of active vcpus guest should have. This way we can
>     leave current cpu_init -> cpu_x86_init -> cpu_exec_init call
>     chain without change and plug/unplug next/last vcpu.
> 
> 2nd approach:
>     Ability to plug/unplug individual vcpus based on their cpu_index.
>     to do this we need add cpu_index argument to cpu_init ->
>     cpu_x86_init -> cpu_exec_init call chain. It will look more
>     like the real hardware cpu hot-plug, but do virtual guests really
>     need it. And what for if this way is more preferrable than the 1st.
> 
>>> +    }
>>> +
>>> +    if (state) {
>>> +        enable_processor(s, cpu);
>>> +    } else {
>>> +        disable_processor(s, cpu);
>>> +    }
>>> +    pm_update_sci(s);
>>> +}
>>> +#endif
>>> +
>>>   static void enable_device(PIIX4PMState *s, int slot)
>>>   {
>>>       s->gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
>>> diff --git a/hw/pc.c b/hw/pc.c
>>> index 33d8090..c7342d8 100644
>>> --- a/hw/pc.c
>>> +++ b/hw/pc.c
>>> @@ -44,6 +44,8 @@
>>>   #include "ui/qemu-spice.h"
>>>   #include "memory.h"
>>>   #include "exec-memory.h"
>>> +#include "cpus.h"
>>> +#include "kvm.h"
>>
>> kvm? Likely not what you want.
> cpu_synchronize_post_init is declared in kvm.h
> Any suggestions are welcome.

Indeed. Ugly. Guess we should move prototypes to cpus.h, probably
uninlining the generic helpers. But that's a separate story. For now
your version is OK then.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



reply via email to

[Prev in Thread] Current Thread [Next in Thread]