qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] x86: Reset MTRR on vCPU reset


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH] x86: Reset MTRR on vCPU reset
Date: Thu, 14 Aug 2014 01:17:37 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 08/14/14 00:06, Alex Williamson wrote:
> On Wed, 2014-08-13 at 22:33 +0200, Laszlo Ersek wrote:
>> a number of comments -- feel free to address or ignore each as you see fit:
>>
>> On 08/13/14 21:09, Alex Williamson wrote:

>>> mappings which are now stale after reset.  The result is that OVMF
>>> rebooting on such a configuration takes a full minute to LZMA
>>> decompress the EFI volume, a process that is nearly instant on the
>>
>> For pedantry, instead of "EFI volume" we could say "LZMA-compressed
>> Firmware File System file in the FVMAIN_COMPACT firmware volume".
> 
> Can you come up with something with maybe half that many words?

"Firmware volume" then. "Firmware volume" is not a generic term, it's a
specific term in the Platform Initialization (PI) spec.

> And
> also, does it matter?

No. :)

> I want someone using OVMF and experiencing a long
> reboot delay to know that this might fix their problem.  Noting that the
> major time consuming stall is in the LZMA decompression code helps to
> rationalize why the mapping change is important.  The specific blob of
> data that's being decompressed seems mostly irrelevant, which is why I
> only gave it 2 words.

Fair enough, it's just that "EFI volume" doesn't mean anything specific
(to me), while "firmware volume" does.

>>> @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>>>      CPUX86State *env = &cpu->env;
>>>      struct {
>>>          struct kvm_msrs info;
>>> -        struct kvm_msr_entry entries[100];
>>> +        struct kvm_msr_entry entries[128];
>>>      } msr_data;
>>>      struct kvm_msr_entry *msrs = msr_data.entries;
>>>      int n = 0, i;
>>> @@ -1278,6 +1283,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>>>              kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_REFERENCE_TSC,
>>>                                env->msr_hv_tsc);
>>>          }
>>> +        if (has_msr_mtrr) {
>>> +            kvm_msr_entry_set(&msrs[n++], MSR_MTRRdefType, 
>>> env->mtrr_deftype);
>>> +            for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
>>> +                kvm_msr_entry_set(&msrs[n++],
>>> +                                  MSR_MTRRphysMask(i), 
>>> env->mtrr_var[i].mask);
>>> +            }
>>> +        }
>>>  
>>>          /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
>>>           *       kvm_put_msr_feature_control. */
>>>
>>
>> I think that this code is correct (and sufficient for the reset
>> problem), but I'm uncertain if it's complete:
>>
>> (a) Shouldn't you put the matching PhysBase registers as well (for the
>> variable range ones)?
>>
>> Plus, shouldn't you put mtrr_fixed[11] too (MSR_MTRRfix64K_00000, ...)?
> 
> If my change wasn't isolated to the reset portion of kvm_put_msrs() then
> I would agree with you.  But since it is, all of those registers are
> undefined by the SDM.

That's a good way to express your point indeed, and a good way to
formulate my concern: I'm not sure your change is isolated to the reset
portion. The check that "gates" the new hunk says

  level >= KVM_PUT_RESET_STATE

and a higher level than that does exist: KVM_PUT_FULL_STATE, which is
used in incoming migration.

>> (b) You only modify kvm_put_msrs(). What about kvm_get_msrs()? I can see
>> that you make the msr putting dependent on:
>>
>>     /*
>>      * The following MSRs have side effects on the guest or are too
>>      * heavy for normal writeback. Limit them to reset or full state
>>      * updates.
>>      */
>>     if (level >= KVM_PUT_RESET_STATE) {
>>
>> But that's probably not your reason for omitting matching new code from
>> kvm_get_msrs(): "HV_X64_MSR_REFERENCE_TSC" is also heavy-weight (visible
>> in your patch's context), but that one is nevertheless handled in
>> kvm_get_msrs().
>>
>> My only reason for (b) is simply symmetry. For example, commit 48a5f3bc
>> added HV_X64_MSR_REFERENCE_TSC at once to both put() and get().
>>
>> According to "target-i386/machine.c", mtrr_deftype and co. are even
>> migrated (part of vmstate), so this asymmetry could become a problem in
>> migration. Eg. source host doesn't fetch MTRR state from KVM, hence wire
>> format carries garbage, but on the target you put (part of) that garbage
>> (right now, just the mask) back into KVM:
>>
>> do_savevm()
>>   qemu_savevm_state()
>>     qemu_savevm_state_complete()
>>       cpu_synchronize_all_states()
>>         cpu_synchronize_state()
>>           kvm_cpu_synchronize_state()
>>             do_kvm_cpu_synchronize_state()
>>               kvm_arch_get_registers()
>>                 kvm_get_msrs()
>>
>> do_loadvm()
>>   load_vmstate()
>>     qemu_loadvm_state()
>>       cpu_synchronize_all_post_init()
>>         cpu_synchronize_post_init()
>>           kvm_cpu_synchronize_post_init()
>>             kvm_arch_put_registers(..., KVM_PUT_FULL_STATE)
>>               kvm_put_msrs(..., KVM_PUT_FULL_STATE)
>>
>> /* state subset modified during VCPU reset */
>> #define KVM_PUT_RESET_STATE     2
>>
>> /* full state set, modified during initialization or on vmload */
>> #define KVM_PUT_FULL_STATE      3
>>
>> Hence I suspect (a) and (b) should be handled.
>>
>> ... And then we arrive at cross-version migration, where both source and
>> target hosts support MTRR, but the source qemu sends unsynchronized MTRR
>> data (ie. garbage) in the migration stream, but the target passes it to
>> KVM. I don't know if this is possible, and if so, what to do about it. :(
> 
> Where does the target pass it to KVM?

That's what I tried to show with the 2nd call stack above, the one
- that is rooted in do_loadvm(),
- and ends in kvm_put_msrs(),
- with "level" equalling KVM_PUT_FULL_STATE (--> 3),
- which the gate, ie. level >= KVM_PUT_RESET_STATE (--> 2), will let
  across.

(I do see that right now the patch passes only a part of the MTRR state
to KVM, but *some* part it does pass.)

> I think you've identified that we
> migrate unsynchronized data, but the good news is that we don't do
> anything with it unless you're running under TCG (in which case it is
> synchronized anyway).

That's where I disagree (but hopefully I'm just confused). I think that
once your patch is applied, part of the MTRR state, read from the
migration stream, will be sent to KVM. Because do_loadvm() leads to
kvm_put_msrs(), with level = KVM_PUT_FULL_STATE >= KVM_PUT_RESET_STATE.

> We neither load nor store the MTRR state from/to
> KVM,

(We don't load, I agree; we don't store, I disagree (after your patch).)

> which may have implications if you were to boot a guest, migrate
> it, then hot-add an assigned device where we need to start caring about
> guest mappings.
> 
>> (BTW,
>>
>>         VMSTATE_MTRR_VARS(env.mtrr_var, X86CPU, 8, 8),
>>
>> should be rebased to MSR_MTRRcap_VCNT too, probably.)
>>
>> Apologies about the verbiage, I just wrote down whatever crossed my
>> mind. I don't think I said anything overly important, but I feel unsafe
>> about giving my R-b until someone disproves my migration worries.
>> (Basically, before the patch, whatever MTRR data was in the migration
>> stream never reached KVM. This changes now.)
> 
> Not really because it only gets pushed to KVM on vCPU reset

(and on KVM_PUT_FULL_STATE >= KVM_PUT_RESET_STATE)

> and we're
> clearing the necessary enable/valid bits.  The rest is undefined anyway.

Indeed, when the put is a consequence of the VM being reset. But I think
that the put is reachable differently (ie. on incoming migration), and
consistency would be important then.

> 
>> ... Is the following argument valid in your opinion?
>>
>>   KVM cares about guest-specified MTRR values *only* when
>>   kvm_arch_has_noncoherent_dma() returns true to vmx_get_mt_mask().
>>   Since "kvm_arch_has_noncoherent_dma() returning true" (ie. device
>>   assignment) exludes migration anyway, we don't have to care about
>>   migration of MTRRs.
> 
> I think we do need to care about migration of MTRRs because a device can
> be hot attached on the migration target while the MTRRs could have been
> programmed on the migration source.  Therefore it doesn't matter than
> device assignment excludes migration.  This patch still seems correct to
> me, but you have identified another issue in the same problem space.
> I'll start working on it.  Thanks,

I certainly invite you and others to correct me; I'm a novice in this area.

Anyway, I had a thought that puts even my migration worries to shame :)
Here goes:

In x86_cpu_reset(), could you debug-log the *prior* value of
"env->mtrr_var[i].mask", ie. before you mask out the valid bit? (Same
for "env->mtrr_deftype", before you mask out the global Enable bit?)

I believe that under KVM, those env->mtrr_XXX fields are *always* zero
(dating back to CPUX86State's z-allocation time). Because, where would
we set them to anything nonzero?

$ git grep mtrr_deftype
target-i386/cpu.h:    uint64_t mtrr_deftype;
target-i386/machine.c:        VMSTATE_UINT64_V(env.mtrr_deftype, X86CPU, 8),
target-i386/misc_helper.c:        env->mtrr_deftype = val;
target-i386/misc_helper.c:        val = env->mtrr_deftype;

"target-i386/misc_helper.c" is TCG, isn't it?

The assignment to "env->mtrr_deftype" happens in function helper_wrmsr()
[target-i386/misc_helper.c], which is *only* called in the following chain:

gen_intermediate_code_internal() [target-i386/translate.c]
  disas_insn()                   [target-i386/translate.c]
    gen_helper_wrmsr()           [target-i386/misc_helper.c]

(gen_helper_wrmsr() resolves to helper_wrmsr() through some ingenious
hacks in "include/exec/helper-gen.h"; run
'git grep -e DEF_HELPER --and -e wrmsr', and then check DEF_HELPER_1().)

So, my theory is that under KVM, the hunk for x86_cpu_reset() doesn't
actually clear any set bits (because all those bits are already zero),
and that the kvm_put_msrs() hunk simply writes all-bits-zero values
(which certainly conforms to the reset requirements!) And this happens
exactly because the patch never *loads* MTRR state from KVM.

Here's a summary of this mess of an email:

- With TCG, I think everything just works, and the hunk for
  x86_cpu_reset() improves MTRR conformance.

- With KVM, the lack of loading MTRR state from KVM, combined with the
  (partial) storing of MTRR state to KVM, has two consequences:
  - migration invalidates (loses) MTRR state,
  - without migration, the clearing actions in x86_cpu_reset() have
    no effect (0 -> 0), but this is masked by the fact that
    all-bits-zero values for the registers in question happen to be
    correct for resetting.

- Both the store (which is now partial) and the load (which is
  nonexistent) should be complete instead, because the store is
  reachable on the incoming migration path too, not just when resetting.

I'm sorry if I sound crazy, but the above should be easy to disprove (by
logging the prior values in x86_cpu_reset(), on KVM, and by setting a
breakpoint on the incoming qemu process).

Thanks,
Laszlo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]