qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] spapr: add initial ibm, client-architecture


From: Alexander Graf
Subject: Re: [Qemu-devel] [RFC PATCH] spapr: add initial ibm, client-architecture-support rtas call support
Date: Wed, 4 Sep 2013 15:37:30 +0200

On 04.09.2013, at 15:08, Alexey Kardashevskiy wrote:

> On 09/04/2013 10:13 PM, Alexander Graf wrote:
>> 
>> On 04.09.2013, at 13:40, Alexey Kardashevskiy wrote:
>> 
>>> On 09/04/2013 08:42 PM, Alexander Graf wrote:
>>>> 
>>>> On 04.09.2013, at 12:19, Alexey Kardashevskiy wrote:
>>>> 
>>>>> This is an RFC patch.
>>>>> 
>>>>> The modern Linux kernel supports every known POWERPC CPU so when
>>>>> it boots, it can always find a matching cpu_spec from the cpu_specs array.
>>>>> However if the kernel is quite old, it may be missing the definition of
>>>>> the actual CPU. To provide ability for old kernels to work on modern
>>>>> hardware, a Logical Processor Version concept was introduced in PowerISA.
>>>>> From the hardware prospective, it is supported by PCR (Processor
>>>>> Compatibility Register) which is defined in PowerISA. The register
>>>>> enables compatibility mode which can be set to PowerISA 2.05 or 2.06.
>>>>> 
>>>>> PAPR+ specification defines a Logical Processor Version per every
>>>>> version of PowerISA specification. PAPR+ also defines
>>>>> a ibm,client-architecture-support rtas call which purpose is to provide
>>>>> a negotiation mechanism for the guest and the hypervisor to work out
>>>>> the best Logical Processor Version to continue with.
>>>>> 
>>>>> At the moment, the Linux kernel calls the ibm,client-architecture-support
>>>>> method and only then reads the device. The current RTAS's handler checks
>>>>> the capabilities from the array supplied by the guest kernel, analyses
>>>>> if QEMU can or cannot provide with the requested features.
>>>>> If QEMU supports everything the guest has requested, it returns from rtas
>>>>> call and the guest continues booting.
>>>>> If some parameter changes, QEMU fixes the device tree and reboots
>>>>> the guest with a new tree.
>>>>> 
>>>>> In this version, the ibm,client-architecture-support handler checks
>>>>> if the current CPU is in the list from the guest and if it is not, QEMU
>>>>> adds a "cpu-version" property to a cpu node with the best of logical PVRs
>>>>> supported by the guest.
>>>>> 
>>>>> Technically QEMU reboots and as a part of reboot, it fixes the tree and
>>>>> this is when the cpu-version property is actually added.
>>>>> 
>>>>> Although it seems possible to add a custom interface between SLOF and QEMU
>>>>> and implement device tree update on the fly to avoid a guest reboot,
>>>>> there still may be cases when device tree change would not be enough.
>>>>> As an example, the guest may ask for a bigger RMA area than QEMU allocates
>>>>> by default.
>>>>> 
>>>>> The patch depends on "[PATCH v5] powerpc: add PVR mask support".
>>>>> 
>>>>> Cc: Nikunj A Dadhania <address@hidden>
>>>>> Cc: Andreas Färber <address@hidden>
>>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>>> ---
>>>>> hw/ppc/spapr.c              | 10 ++++++
>>>>> hw/ppc/spapr_hcall.c        | 76 
>>>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>>> include/hw/ppc/spapr.h      |  7 ++++-
>>>>> target-ppc/cpu-models.h     | 13 ++++++++
>>>>> target-ppc/cpu-qom.h        |  1 +
>>>>> target-ppc/translate_init.c |  3 ++
>>>>> 6 files changed, 109 insertions(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>> index 13574bf..5adf53c 100644
>>>>> --- a/hw/ppc/spapr.c
>>>>> +++ b/hw/ppc/spapr.c
>>>>> @@ -238,6 +238,16 @@ static int spapr_fixup_cpu_dt(void *fdt, 
>>>>> sPAPREnvironment *spapr)
>>>>>       if (ret < 0) {
>>>>>           return ret;
>>>>>       }
>>>>> +
>>>>> +        if (spapr->pvr_new) {
>>>>> +            ret = fdt_setprop(fdt, offset, "cpu-version",
>>>>> +                              &spapr->pvr_new, sizeof(spapr->pvr_new));
>>>>> +            if (ret < 0) {
>>>>> +                return ret;
>>>>> +            }
>>>>> +            /* Reset as the guest after reboot may give other PVR set */
>>>>> +            spapr->pvr_new = 0;
>>>>> +        }
>>>>>   }
>>>>>   return ret;
>>>>> }
>>>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>>>> index 9f6e7b8..509de89 100644
>>>>> --- a/hw/ppc/spapr_hcall.c
>>>>> +++ b/hw/ppc/spapr_hcall.c
>>>>> @@ -3,6 +3,7 @@
>>>>> #include "helper_regs.h"
>>>>> #include "hw/ppc/spapr.h"
>>>>> #include "mmu-hash64.h"
>>>>> +#include "cpu-models.h"
>>>>> 
>>>>> static target_ulong h_random(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>>>                          target_ulong opcode, target_ulong *args)
>>>>> @@ -792,6 +793,78 @@ out:
>>>>>   return ret;
>>>>> }
>>>>> 
>>>>> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>>>>> +                                                  sPAPREnvironment 
>>>>> *spapr,
>>>>> +                                                  target_ulong opcode,
>>>>> +                                                  target_ulong *args)
>>>>> +{
>>>>> +    target_ulong list = args[0];
>>>>> +    int i, number_of_option_vectors;
>>>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>>>> +    bool cpu_match = false;
>>>>> +    unsigned compat_cpu_level = 0, pvr_new;
>>>>> +
>>>>> +    /* Parse PVR list */
>>>>> +    for ( ; ; ) {
>>>>> +        uint32_t pvr, pvr_mask;
>>>>> +
>>>>> +        pvr_mask = ldl_phys(list);
>>>>> +        list += 4;
>>>>> +        pvr = ldl_phys(list);
>>>>> +        list += 4;
>>>>> +
>>>>> +        if ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask)) {
>>>>> +            cpu_match = true;
>>>>> +            pvr_new = cpu->env.spr[SPR_PVR];
>>>>> +        }
>>>>> +
>>>>> +        /* Is it a logical PVR? */
>>>>> +        if ((pvr & CPU_POWERPC_LOGICAL_MASK) == 
>>>>> CPU_POWERPC_LOGICAL_MASK) {
>>>>> +            switch (pvr) {
>>>>> +            case CPU_POWERPC_LOGICAL_2_05:
>>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_05) &&
>>>>> +                    (compat_cpu_level < 2050)) {
>>>>> +                    compat_cpu_level = 2050;
>>>>> +                    pvr_new = pvr;
>>>>> +                }
>>>>> +                break;
>>>>> +            case CPU_POWERPC_LOGICAL_2_06:
>>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>>>>> +                    (compat_cpu_level < 2060)) {
>>>>> +                    compat_cpu_level = 2060;
>>>>> +                    pvr_new = pvr;
>>>>> +                }
>>>>> +                break;
>>>>> +            case CPU_POWERPC_LOGICAL_2_06_PLUS:
>>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>>>>> +                    (compat_cpu_level < 2061)) {
>>>>> +                    compat_cpu_level = 2061;
>>>>> +                    pvr_new = pvr;
>>>>> +                }
>>>>> +                break;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        if (~pvr_mask & pvr) {
>>>>> +            break;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    /* Parsing finished. Making decision whether to reboot or not */
>>>>> +    if (cpu_match) {
>>>>> +        return H_SUCCESS;
>>>>> +    }
>>>>> +    if (pvr_new == cpu->env.spr[SPR_PVR]) {
>>>>> +        return H_SUCCESS;
>>>>> +    }
>>>>> +
>>>>> +    cpu->env.spr[SPR_PVR] = pvr_new;
>>>> 
>>> 
>>>> This would change the SPR value the guest sees. IIUC this is not what is
>>>> happening with this call.
>>> 
>>> I am always trying to avoid adding new variables but yes, this is not the
>>> case, I'll fix it.
>>> 
>>>> 
>>>>> +    spapr->pvr_new = pvr_new;
>>>>> +    qemu_system_reset_request();
>>>> 
>>> 
>>>> I think there should be a way to define the compat mode from the
>>>> beginning without the need to reboot the machine from itself. 
>>> 
>>> That is a different problem which I do not how to solve in a way to make
>>> everybody happy. Add logical CPUs to every CPU family (at least for
>>> POWER7/7+/8)?
>> 
> 
>> I'm not sure the CPU is really the right place to put this information.
>> After all, you are not changing the CPU itself, you're only changing a
>> few bits inside of that CPU that make it appear closer to the legacy CPU
>> plus a few device tree changes.
> 
>> So IMHO this whole thing should be orthogonal to -cpu.
> 
> Well, since we cannot change CPU class on the fly, yes, it should be a
> "compatibility" flags/properties/methods/whatever of the default CPU for
> the specific family.

Since it's machine global, it could just as well be a machine option, no? Or 
can you have multiple CPUs with different compat modes in a single system?

> 
> 
>>>> That way
>>>> management tools can straight on create POWER6 compat machines without
>>>> jumping through reboot hoops.
>>> 
>>> One of the examples (came from Paul) is:
>>> the host runs on POWER8, the guest boots a kernel which is capable of
>>> POWER7 only + POWER7-compat. We do this reboot tweak and boot in
>>> POWER7-compat mode. Then the guest does "yum update" and gets POWER8 kernel
>>> installed so when it reboots, it will boot in normal POWER8 mode. Everybody
>>> is happy.
>>> 
>>> Having POWER7-compat mode set from the very beginning will break this
>>> behaviour.
>> 
>> Why? Just because you're on POWER7 as default doesn't mean you can't bump to 
>> a newer compat later on, no?
> 
> Bump when exactly? And it won't be a new compat, it will be a native CPU. I

If you configure your guest to boot in POWER7-compat mode on your POWER8 box 
and it then tells you through ibm,client-architecture-support that it can do 
POWER8, we can just remove all the compat bits and be happy, no?

> thought you are totally against reboots and you want libvirt (for example)
> to detect that this was ibm,client-architecture-support and reboot the
> guest with new CPU type (or machine option).

I'm not sure which way the best one forward is, but at first we need to get 
something working that feels natural for people coming from x86 as well as pHyp.

So I think we should enable both use cases. We should do the on-the-fly 
transitioning in QEMU with the chance of libvirt intercepting it. I don't think 
the intercepting part is a hard requirement today, but we should keep it in 
mind to know the full picture.

That way a client-architecture-support rtas call can either reboot the system 
and automatically do "the right thing" and/or it can tell libvirt via QMP that 
we changed the client-architecture-support which means libvirt now has the 
chance to reconfigure the default setting next time it spawns a VM.

Which gets us to the third bit. You need to be able to tell QEMU what the 
"default" client-architecture-support level is when you boot a VM to make sure 
that a POWER7 guest on a POWER8 system doesn't always reboot to reconfigure 
itself first thing when it boots up. As Ben mentioned, migration is obviously a 
strong point for this one too.

So what do we need?

  - machine flag to indicate compat level
  - machine reset evaluates compat level, adjusts VM accordingly
  - rtas call merely modifies machine flag and triggers a reset
  - optional: QMP notifier when the rtas call changed the machine flag


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]