qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5] spapr: Add support for time base offset migr


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] [PATCH v5] spapr: Add support for time base offset migration
Date: Mon, 14 Apr 2014 18:10:39 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 04/14/2014 05:46 PM, Alexander Graf wrote:
> 
> On 14.04.14 09:33, Alexey Kardashevskiy wrote:
>> On 04/14/2014 05:08 PM, Alexander Graf wrote:
>>> On 12.04.14 17:52, Alexey Kardashevskiy wrote:
>>>> This allows guests to have a different timebase origin from the host.
>>>>
>>>> This is needed for migration, where a guest can migrate from one host
>>>> to another and the two hosts might have a different timebase origin.
>>>> However, the timebase seen by the guest must not go backwards, and
>>>> should go forwards only by a small amount corresponding to the time
>>>> taken for the migration.
>>>>
>>>> This is only supported for recent POWER hardware which has the TBU40
>>>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
>>>> 970.
>>>>
>>>> This adds kvm_access_one_reg() to access a special register which is not
>>>> in env->spr.
>>>>
>>>> The feature must be present in the host kernel.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>> ---
>>>> Changes:
>>>> v5:
>>>> * fixed multiple comments in cpu_ppc_get_adjusted_tb and merged it
>>>> into timebase_post_load()
>>>> * removed round_up(1<<24) as KVM is expected to do this anyway
>>>> * removed @freq from migration stream
>>>> * renamed PPCTimebaseOffset to PPCTimebase
>>>> * CLOCKS_PER_SEC is used as a constant which 1000000us/s (man clock)
>>>>
>>>> v4:
>>>> * made it per machine timebase offser rather than per CPU
>>>>
>>>> v3:
>>>> * kvm_access_one_reg moved out to a separate patch
>>>> * tb_offset and host_timebase were replaced with guest_timebase as
>>>> the destionation does not really care of offset on the source
>>>>
>>>> v2:
>>>> * bumped the vmstate_ppc_cpu version
>>>> * defined version for the env.tb_env field
>>>> ---
>>>>    hw/ppc/ppc.c           | 76
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    hw/ppc/spapr.c         |  4 +--
>>>>    include/hw/ppc/spapr.h |  1 +
>>>>    target-ppc/cpu-qom.h   | 15 ++++++++++
>>>>    target-ppc/kvm.c       |  5 ++++
>>>>    trace-events           |  3 ++
>>>>    6 files changed, 102 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
>>>> index 71df471..bf61a0a 100644
>>>> --- a/hw/ppc/ppc.c
>>>> +++ b/hw/ppc/ppc.c
>>>> @@ -29,9 +29,11 @@
>>>>    #include "sysemu/cpus.h"
>>>>    #include "hw/timer/m48t59.h"
>>>>    #include "qemu/log.h"
>>>> +#include "qemu/error-report.h"
>>>>    #include "hw/loader.h"
>>>>    #include "sysemu/kvm.h"
>>>>    #include "kvm_ppc.h"
>>>> +#include "trace.h"
>>>>      //#define PPC_DEBUG_IRQ
>>>>    //#define PPC_DEBUG_TB
>>>> @@ -829,6 +831,80 @@ static void cpu_ppc_set_tb_clk (void *opaque,
>>>> uint32_t freq)
>>>>        cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
>>>>    }
>>>>    +static void timebase_pre_save(void *opaque)
>>>> +{
>>>> +    PPCTimebase *tb = opaque;
>>>> +    uint64_t ticks = cpu_get_real_ticks();
>>>> +    PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
>>>> +
>>>> +    if (!first_ppc_cpu->env.tb_env) {
>>>> +        error_report("No timebase object");
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    tb->time_of_the_day = get_clock_realtime() / 1000;
>>> It'd be good to indicate in the field name that we're in us units. In fact
>>> it probably makes sense to use ns throughout and not convert :).
>>> Nanoseconds are QEMU's "native" internal time format.
>> Ooookay, I'll make it "ns", then no indicator is needed, right?
> 
> I still prefer to keep an indicator. Makes it more readable.
> 
>>
>>
>>>> +    /*
>>>> +     * tb_offset is only expected to be changed by migration so
>>>> +     * there is no need to update it from KVM here
>>>> +     */
>>>> +    tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
>>>> +}
>>>> +
>>>> +static int timebase_post_load(void *opaque, int version_id)
>>>> +{
>>>> +    PPCTimebase *tb = opaque;
>>>> +    CPUState *cpu;
>>>> +    PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
>>>> +    int64_t tb_off_adj, tb_off;
>>>> +    int64_t migration_duration_us, migration_duration_tb, guest_tb,
>>>> host_us;
>>>> +    unsigned long freq;
>>>> +
>>>> +    if (!first_ppc_cpu->env.tb_env) {
>>>> +        error_report("No timebase object");
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    freq = first_ppc_cpu->env.tb_env->tb_freq;
>>>> +    /*
>>>> +     * Calculate timebase on the destination side of migration.
>>>> +     * The destination timebase must be not less than the source
>>>> timebase.
>>>> +     * We try to adjust timebase by downtime if host clocks are not
>>>> +     * too much out of sync (1 second for now).
>>>> +     */
>>>> +    host_us = get_clock_realtime() / 1000;
>>>> +    migration_duration_us = MIN(CLOCKS_PER_SEC, host_us -
>>>> tb->time_of_the_day);
>>> CLOCKS_PER_SEC only make sense with the return value of clock(). I wouldn't
>>> be surprised if this calculation goes totally bonkers on BSD or Windows
>>> systems.
>>
>> http://linux.die.net/man/3/clock
>> C89, C99, POSIX.1-2001. POSIX requires that CLOCKS_PER_SEC equals 1000000
>> independent of the actual resolution.
>>
>> This should be valid for BSD too. Ok, this has no effect on Windows.
>> Replace definition with get_ticks_per_sec()? Defining USEC_PER_SEC (or
>> NSEC_xxx) does not make much sense.
> 
> I'd just define NSEC_PER_SEC.
> 
>>
>>> I still don't understand why we're doing a MIN() here. I would understand a
>>> MAX() to guard against unsynchronized times on source/destination.
>> If I added time difference between out-of-sync hosts, the delta would be
>> huge and the destination guest would behave really weird until this time
>> difference is over. Guests can cope with accidental timebase jump of
>> seconds but not minutes.
> 
> Exactly. That's MAX(), right?


Sorry, I am not following you here :(
MAX gives bigger number than MIN and bigger number is what I want to avoid.
I even tried both, MIN works, MAX does not.



-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]