qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_sta


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue
Date: Fri, 4 Jan 2013 11:29:26 +0100

On 04.01.2013, at 11:23, Bhushan Bharat-R65777 wrote:

> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:address@hidden
>> Sent: Friday, January 04, 2013 2:11 PM
>> To: Bhushan Bharat-R65777
>> Cc: Marcelo Tosatti; address@hidden; Christian Borntraeger; Anthony
>> Liguori; address@hidden qemu-devel
>> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
>> do_kvm_cpu_synchronize_state data integrity issue
>> 
>> 
>> On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Marcelo Tosatti [mailto:address@hidden
>>>> Sent: Friday, January 04, 2013 7:08 AM
>>>> To: Alexander Graf
>>>> Cc: address@hidden; Christian Borntraeger; Anthony
>>>> Liguori; qemu- address@hidden qemu-devel; Bhushan Bharat-R65777
>>>> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
>>>> do_kvm_cpu_synchronize_state data integrity issue
>>>> 
>>>> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
>>>>> 
>>>>> On 03.01.2013, at 19:48, Jason J. Herne wrote:
>>>>> 
>>>>>> On 01/03/2013 08:56 AM, Alexander Graf wrote:
>>>>>>>> static void do_kvm_cpu_synchronize_state(void *_args)
>>>>>>>>> {
>>>>>>>>>   struct kvm_cpu_syncstate_args *args = _args;
>>>>>>>>> +    CPUArchState *env = args->env;
>>>>>>>>> +    int register_level = args->register_level;
>>>>>>>>> 
>>>>>>> This probably becomes more readable if we explicitly revert back
>>>>>>> to
>>>> unsynced state first:
>>>>>>> 
>>>>>>> /* Write back local modifications at our current level */ if
>>>>>>> (register_level > env->kvm_vcpu_dirty) {
>>>>>>>   kvm_arch_put_registers(...);
>>>>>>>   env->kvm_vcpu_dirty = 0;
>>>>>>> }
>>>>>>> 
>>>>>>> and then do the sync we are requested to do:
>>>>>>> 
>>>>>>> if (!env->kvm_vcpu_dirty) {
>>>>>>>   ...
>>>>>>> }
>>>>>> 
>>>>>> I agree, but only if we add a second conditional to the if 1st
>>>>>> statement  as
>>>> such:
>>>>>> 
>>>>>> if (args->env->kvm_vcpu_dirty && register_level >
>>>>>> env->kvm_vcpu_dirty)
>>>>>> 
>>>>>> This is to cover the case where the caller is asking for register level 
>>>>>> "1"
>>>> and we're already dirty at level "2". In this case, nothing should
>>>> happen and we'll need the "args->env->kvm_vcpu_dirty" to ensure that is the
>> case.
>>>>> 
>>>>> As before, I'd prefer to make this explicit:
>>>>> 
>>>>>> 
>>>>>> static void do_kvm_cpu_synchronize_state(void *_args) {
>>>>>>  struct kvm_cpu_syncstate_args *args = _args;
>>>>>>  CPUArchState *env = args->env;
>>>>>>  int register_level = args->register_level;
>>>>> 
>>>>> if (register_level > env->kvm_vcpu_dirty) {
>>>>>   /* We are more dirty than we need to - all is well */
>>>>>   return;
>>>>> }
>>>>> 
>>>>>> 
>>>>>>  /* Write back local modifications at our current level */
>>>>>>  if (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) {
>>>>>>      kvm_arch_put_registers(env, env->kvm_vcpu_dirty);
>>>>>>      env->kvm_vcpu_dirty = 0;
>>>>>>  }
>>>>>> 
>>>>>>  if (!args->env->kvm_vcpu_dirty) {
>>>>>>      kvm_arch_get_registers(env, register_level);
>>>>>>      env->kvm_vcpu_dirty = register_level;
>>>>>>  }
>>>>>> }
>>>>>> 
>>>>>> Do you agree?  Thanks for your time. :)
>>>>> 
>>>>> Please also check out the discussions I've had with Bharat about his
>>>>> watchdog
>>>> patches. There we need a mechanism to synchronize registers only when
>>>> we actually need to, in order to avoid potential race conditions with
>>>> a kernel timer.
>>>>> 
>>>>> That particular case doesn't work well with levels. We can have
>>>>> multiple
>>>> different potential race producers in the kernel that we need to
>>>> avoid individually, so we can't always synchronize all of them when
>>>> only one of them needs to be synchronized.
>>>>> 
>>>>> The big question is what we should be doing about this. We basically
>>>>> have 3
>>>> options:
>>>>> 
>>>>> * implement levels, treat racy registers as "manually
>>>>> synchronized", as
>>>> Bharat's latest patch set does
>>>>> * implement levels, add a bitmap for additional special
>>>>> synchronization bits
>>>>> * replace levels by bitmap
>>>>> 
>>>>> I'm quite frankly not sure which one of the 3 would be the best way 
>>>>> forward.
>>>>> 
>>>>> 
>>>>> Alex
>>>> 
>>>> Implement read/write accessors for individual registers, similarly to
>>>> how its done in the kernel. See kvm_cache_regs.h.
>>> 
>>> Read/write for individual registers can be heavy for cases where multiple
>> register needed to be updated. Is not it?
>> 
>> It depends. We can still have classes of registers to synchronize that we 
>> could
>> even synchronize using MANY_REG. For writes, things become even faster with
>> individual accessors since we can easily coalesce all register writes until 
>> we
>> hit the vcpu again into a MANY_REG array and write them in one go.
>> 
>>> 
>>> For cases where multiple register needed to be synchronized then I would 
>>> like
>> to elaborate the options as:
>>> 
>>> Option 1:
>>> int timer_func(CPUArchState env)
>>> {
>>>     cpu_synchronize_state(env);
>>>     //update env->timer_registers
>>>     env->updated_regs_type |= TIMER_REGS_TYPE_BIT;
>> 
>> To scale this one, it's probably also more clever to swap the logic:
>> 
>>    env->kvm_sync_extra |= SYNC_TIMER_REGS;
>>    cpu_synchronize_state(env); /* gets timer regs */
>>    /* update env->timer_registers */
>>    /* timer regs will be synchronized later because kvm_sync_extra has
>> SYNC_TIMER_REGS set */
>> 
>> Your case right now is special in that we don't need to get any register 
>> state,
>> but only write it. However, if we do
>> 
>>    cpu_synchronize_state(env); /* will not get timer regs */
>>    env->kvm_sync_extra |= SYNC_TIMER_REGS;
>> 
>> then the next
>> 
>>    cpu_synchronize_state(env);
>> 
>> would overwrite the timer values again. So we would also need to indicate 
>> that
>> the bits are dirty:
>> 
>>    cpu_synchronize_state(env); /* will not get timer regs */
>>    env->kvm_sync_extra |= SYNC_TIMER_REGS;
>>    env->kvm_sync_dirty |= SYNC_TIMER_REGS;
>> 
>> which cpu_synchronize_state() could use to make sure it doesn't read back any
>> timer regs again.
> 
> yes
> 
>> It would obviously also have to set it in its normal operation
>> when it executed first ;).
> 
> We also need get all registers like for debugging and we should have flag 
> like ALL_REGS etc.

Well, we could just take the current level of "general purpose registers" as 
one flag.

> 
> But I general I agree with the idea. 
> 
>> 
>>> }
>>> 
>>> Change the kvm_arch_put_registers() to add followings:
>>> 
>>> int kvm_arch_put_registers(CPUArchState *env, int level) {
>>> 
>>>     If (env->updated_regs_type) {
>>>             struct kvm_sregs sregs;
>>>             If (env->updated_regs_type & TIMER_REGS_TYPE_BIT) {
>>>                     // update sregs->timer_registers;
>>>                     // may be we can have a bitmap to tell kernel for what
>> actually updated
>>>             }
>>>             If (env->updated_regs_type & XX_CPU_REGS_TYPE) {
>>>                     // update respective registers in sregs
>>>             }
>>> 
>>>             ioctl(env, KVM_GET_SREGS, &sregs);
>>>     }
>>> }
>>> 
>>> This mechanism will get all registers state but this is required only once 
>>> per
>> entry in QEMU.
>> 
>> I don't understand this bit
>> 
>>> 
>>> 
>>> Option 2:
>>> Define read_regs_type() and write_regs_type() for cases where it requires
>> multiple registers updates:
>>> 
>>> int read_regs_type((CPUArchState env, void *regs_ptr, int reg_type) {
>>>    -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.
>>> 
>>>     Switch(reg_type) {
>>>     Case TIMER_REGS:
>>>             Struct *timer_regs = regs_ptr;
>>>             Ioctl(KVM_GET_SREGS, timer_regs, CPU_reg_type);
>>>             Break;
>>> 
>>>     Default:
>>>             Printf(error);
>>> }
>>> 
>>> Similarly will be the write function:
>>> int write_regs_type((CPUArchState env, int reg_type) {
>>>    -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.
>>> 
>>>     Switch(reg_type) {
>>>     Case TIMER_REGS:
>>>             Struct timer_regs;
>>>             Timer_regs.reg1 = env->timer_regs->reg1;
>>>             Timer_regs.reg2 = env->timer_regs->reg2;
>>>             Ioctl(env, KVM_SET_SREGS, &timer_regs, CPU_reg_type);
>>>             Break;
>>> 
>>>     Default:
>>>             Printf(error);
>>> }
>>> 
>>> Int timer_func(CPUxxState env)
>>> {
>>>   Struct timer_regs;
>>>   read_regs_type((env, &timer_regs,TIMER_REGS);
>>>   //update env->timer_registers
>>>   Write_regs_type(env, TIMER_REGS)
>>> }
>>> 
>>> This will get one type of register_types and can cause multiple IOCTL per
>> entry to QEMU.
>> 
>> This is still too explicit. How about
>> 
>> static inline void ppc_set_tsr(CPUState *env, target_ulong val) {
>>    env->kvm_sync_extra |= SYNC_TIMER_REGS;
>>    cpu_synchronize_registers(env);
>>    env->spr[SPR_TSR] = val;
> 
> You also want env->kvm_sync_dirty also, right?

Not quite, since SYNC_TIMER_REGS spans more than only TSR. So we need to make 
sure we fetch the non-TSR register values before we can declare TIMER_REGS as 
dirty.

> 
>> }
>> 
>> static inline void ppc_set_tcr(CPUState *env, target_ulong val) {
>>    env->kvm_sync_extra |= SYNC_TIMER_REGS;
>>    cpu_synchronize_registers(env);
>>    env->spr[SPR_TCR] = val;
> 
> Same as above 
> 
>> }
>> 
>> int timer_func(CPUState *env) {
>>    ppc_set_tsr(env, 0);
> 
> SYNC_TIMERS_REGS are get only once on first call.

Sure, but that's implicit. A user of these functions doesn't have to care which 
one comes first.


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]