qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/7 v2] KVM regsync: Add register bitmap paramet


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state
Date: Thu, 24 Jan 2013 13:40:49 +0100

On 17.01.2013, at 00:01, Marcelo Tosatti wrote:

> On Wed, Jan 16, 2013 at 09:41:54PM +0100, Christian Borntraeger wrote:
>> On 16/01/13 21:21, Marcelo Tosatti wrote:
>>> On Wed, Jan 16, 2013 at 09:03:20PM +0100, Christian Borntraeger wrote:
>>>> On 16/01/13 17:05, Marcelo Tosatti wrote:
>>>> 
>>>>> The S/390 problem, from
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2012-11/msg02213.html:
>>>>> 
>>>>> ">>> The kvm register sync needs to happen in the kvm register sync
>>>>>>>> function :)
>>>>>>> That would eliminate the whole purpose of sync regs and forces us to
>>>>>>> have an
>>>>>>> expensive ioctl on lots of exits (again). I would prefer to sync the 
>>>>>>> registers
>>>>>>> that we never need in qemu just here.
>>>>>> 
>>>>>> That's why the register sync has different stages.
>>>>> 
>>>>> Not the get_register. Which is called on every synchronize_state. Which
>>>>> happen 
>>>>> quite often
>>>>> on s390."
>>>>> 
>>>>> But wait: on these S/390 codepaths, you do GET_REGS already, via
>>>>> cpu_synchronize_state.
>>>>> 
>>>>> So on S/390
>>>>> 
>>>>> - cpu_synchronize_state(env)
>>>>> - read any register from env
>>>>> 
>>>>> Is not valid? This is what generic code assumes.
>>>> 
>>>> TO recap the motiviation:
>>>> 
>>>> cpu_synchronize_state on s390 currently updates any register in env that is
>>>> used by qemu (general purpose, prefix, psw, control and access) in the 
>>>> normal
>>>> runtime. it turns out we have all of these regs in kvm_run, so we can do 
>>>> synchronize states without doing an additional ioctl call.
>>>> Now, for life migration and dump we need some additional registers (which 
>>>> are
>>>> only accessable via onereg interface). So synchronize_state would need to
>>>> do 3 or 4 additional system calls on the hot path, only to take care of 
>>>> something that is not on the hot path at all.
>>>> For historic reasons, we have one exit code for almost all exits. 
>>>> Therefore,
>>>> we need to call synchronize_states almost always.
>>>> We could now start to have a poor mans synchronize_state in arch code, but
>>>> that would collide with common code synchronize_state if done at the wrong
>>>> time. Thus we want to make common code capable of having only a subset of
>>>> the register synched - by making it possible to sync the other regs later
>>>> on if needed without wiping the former sync.
>>>> 
>>>> Makes sense?
>>>> 
>>>> Christian
>>> 
>>> Yes. As noted in the last email on the thread, runtime/reset/full are to
>>> serapate sets of registers when writing _to_ kernel. When reading _from_
>>> kernel, reset and full distinctions are not appropriate (any register
>>> can change, as far as knowledge goes).
>> 
>> Hmm, I probably did not understood your point, so I will try to explain mine
>> and see what you respond :-)
>> 
>> The point of the patch set, is to allow this distinction when reading. 
>> In other words it allows code to state: I am only interested in regxy and 
>> dont
>> care if the other regs in env are out of sync.
> 
> Fine.
> 
>> If a full sync is necessary later on the other regs are synched as well.
>> If a full sync was already done before a partial get becomes a no-op.
> 
> -> FULL is the set of registers written when loadvm/initialization is
> performed.
> -> RESET, a subset of full, is a set of registers written on SYSTEM
> RESET.
> -> RUNTIME, a subset of RESET, is a set of registers written during
> RUNTIME.
> 
> To write both the RESET and FULL set of registers during runtime,
> contradicts the description above for both RESET and FULL.
> 
> Two examples from i386:
> 
>    if (level == KVM_PUT_FULL_STATE) {
>        /*
>         * KVM is yet unable to synchronize TSC values of multiple VCPUs
>         * on
>         * writeback. Until this is fixed, we only write the offset to
>         * SMP
>         * guests after migration, desynchronizing the VCPUs, but
>         * avoiding
>         * huge jump-backs that would occur without any writeback at
>         * all.
>         */
>       ...
>    }
> 
> And:
> 
>    /*
>     * The following paravirtual MSRs have side effects on the guest or
>     * are
>     * too heavy for normal writeback. Limit them to reset or full state
>     * updates.
>     */
> 
>> Why should that be not possible.
> 
> It should, but separately from FULL/RESET/RUNTIME distinction.
> This sequence
> 
>       get_regs(FULLSTATE)
>       put_regs(FULLSTATE)
> 
> During runtime is not allowed. And only syncing the RUNTIME set of
> registers during and leaving the FULL set of registers marked as
> dirty is confusing also. 
> 
> So perhaps what you'd want is selective read/write of RUNTIME registers
> as suggested.
> 
> 
> Date: Fri, 4 Jan 2013 23:49:42 -0200
> From: Marcelo Tosatti <address@hidden>
> To: "Jason J. Herne" <address@hidden>
> Cc: Alexander Graf <address@hidden>,
>       Bhushan Bharat-R65777 <address@hidden>,
>       Christian Borntraeger <address@hidden>,
>       Anthony Liguori <address@hidden>,
>       "address@hidden qemu-devel" <address@hidden>
> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> do_kvm_cpu_synchronize_state data integrity issue
> 
> On Fri, Jan 04, 2013 at 10:25:45AM -0500, Jason J. Herne wrote:
>> If I've followed the conversation correctly this is what needs to be done:
>> 
>> 1. Remove the level parameters from kvm_arch_get_registers and
>> kvm_arch_put_registers.
>> 
>> 2. Add a new bitmap parameter to kvm_arch_get_registers and
>> kvm_arch_put_registers.
>> 
>> 3. Define a bit that correlates to our current notion of "all
>> runtime registers".  This bit, and all bits in this bitmap, would be
>> architecture specific.
>> 
>> 4. Remove the cpustate->kvm_sync_dirty field.  Replace it with a
>> bitmap that tracks which bits are dirty and need to be synced back
>> to KVM-land.
>> 
>> 5. As we do today, we'll assume registers are dirty and turn on
>> their corresponding bit in this new bitmap whenever we "get" the
>> registers from KVM.
>> 
>> 6. Add other bits as needed on a case by case basis.
>> 
>> Does this seem to match what was discussed, and what we want to do?
>> 
>> 
>> -- 
>> -- Jason J. Herne (address@hidden)
> 
> The s390 change to use runtime_state is self-contained and could be
> integrated now from my pov.
> 
> The suggestion was based on the observation that the ppc guys are trying 
> to fix their timer problem and that should be a generic solution,
> per-register granularity.
> 
> Perhaps something along the lines of 
> 
> init
> all registers marked as cached (read/write accessors don't 
> synchronize state)
> 
> Around ioctl(KVM_RUN) have:
> 
> if regs dirty
>       writeback
> r = ioctl(KVM_RUN)
> mark runstate registers as not cached
> 
> Then
> 
> read_reg(x)
>       if x not cached
>               arch_get_regs(RUNTIME_STATE) (*)
> 
> write_reg(x, val)
>       read_reg(x)
>       cpustate->x = val;
>       mark_dirty(x)
> 
> Which is basically the pattern used in KVM x86 (but instead of
> ioctl(KVM_RUN) there is VMENTRY).

But that would mean that any code in QEMU that accesses registers can't access 
env-> ,but instead needs to go through an accessor function. That's a lot of 
potential for subtile error, no?

I think for now the best choice for get_regs() would be to ignore the 
FULL/RESET bits and always keep the syncing as it happens today under the 
RUNTIME umbrella only. So all of get_regs() only checks for RUNTIME.

Whenever get_xxx() happens, a bit gets set for set_xxx(). Up to this point, 
only the RUNTIME bit is ever set, because that's what 
cpu_synchronize_registers() sets.

Then s390 can add special separate bits for "sync GPRs" and "sync CRs". 
SYNC_RUNTIME would include those bits. The kvm hypercall exit calls a new 
synchronize_registers() function with a parameter telling it to only sync GPRs. 
This marks GPRs dirty, but not RUNTIME. The set_registers() function in s390 
specific code could handle this particular case specially.

That way everything's solved and scalable, no?


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]