Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SM

From:	Igor Mammedov
Subject:	Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature
Date:	Wed, 18 Jan 2017 19:06:59 +0100
On Wed, 18 Jan 2017 18:23:45 +0100
Laszlo Ersek <address@hidden> wrote:

> On 01/18/17 17:26, Igor Mammedov wrote:
> > On Wed, 18 Jan 2017 16:42:27 +0100
> > Laszlo Ersek <address@hidden> wrote:
> >   
> >> On 01/18/17 13:38, Igor Mammedov wrote:  
> >>> On Wed, 18 Jan 2017 11:23:48 +0100
> >>> Laszlo Ersek <address@hidden> wrote:
> >>>     
> >>>> On 01/18/17 11:19, Laszlo Ersek wrote:    
> >>>>> On 01/18/17 11:03, Igor Mammedov wrote:      
> >>>>>> On Tue, 17 Jan 2017 19:53:21 +0100
> >>>>>> Laszlo Ersek <address@hidden> wrote:
> >>>>>>      
> >>>>
> >>>> [snip]
> >>>>    
> >>>>>>> This is the code I wrote (extracting a new ich9_apm_broadcast_smi()
> >>>>>>> function from ich9_apm_ctrl_changed(), due to size reasons):
> >>>>>>>      
> >>>>>>>> static void ich9_apm_broadcast_smi(void)
> >>>>>>>> {
> >>>>>>>>     CPUState *cs;
> >>>>>>>>
> >>>>>>>>     cpu_synchronize_all_states(); /* <--------- mark this */      
> >>>>>> it probably should be executed after pause_all_vcpus(),
> >>>>>> maybe it will help.
> >>>>>>      
> >>>>>>>>     CPU_FOREACH(cs) {
> >>>>>>>>         X86CPU *cpu = X86_CPU(cs);
> >>>>>>>>         CPUX86State *env = &cpu->env;
> >>>>>>>>
> >>>>>>>>         if (env->smbase == 0x30000 && env->eip == 0xfff0) {
> >>>>>>>>             CPUClass *k = CPU_GET_CLASS(cs);
> >>>>>>>>             uint64_t cpu_arch_id = k->get_arch_id(cs);
> >>>>>>>>
> >>>>>>>>             trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
> >>>>>>>>             continue;
> >>>>>>>>         }
> >>>>>>>>
> >>>>>>>>         cpu_interrupt(cs, CPU_INTERRUPT_SMI);
> >>>>>>>>     }
> >>>>>>>> }        
> >>>>>>>      
> >>>>>> [...]      
> >>>>>>> (b) If I add the cpu_synchronize_all_states() call before the loop, 
> >>>>>>> then
> >>>>>>> the incorrect filter matches go away; QEMU sees the KVM VCPU states
> >>>>>>> correctly, and the SMI is broad-cast.
> >>>>>>>
> >>>>>>> However, in this case, the boot slows to a *crawl*. It's unbearable. 
> >>>>>>> All
> >>>>>>> VCPUs are pegged at 100%, and you can easily follow the OVMF debug log
> >>>>>>> with the naked eye, almost line by line.
> >>>>>>> I didn't expect that cpu_synchronize_all_states() would be so costly,
> >>>>>>> but it makes the idea of checking VCPU state in the APM_CNT handler a
> >>>>>>> non-starter.      
> >>>>>> I wonder were bottleneck is to slow down guest so much.
> >>>>>> What is actual cost of calling cpu_synchronize_all_states()?
> >>>>>>
> >>>>>> Maybe calling pause_all_vcpus() before cpu_synchronize_all_states()
> >>>>>> would help.      
> >>>>>
> >>>>> If I prepend just a pause_all_vcpus() function call, at the top, then
> >>>>> the VM freezes (quite understandably) when the first SMI is raised via
> >>>>> APM_CNT.
> >>>>>
> >>>>> If I, instead, bracket the function body with pause_all_vcpus() and
> >>>>> resume_all_vcpus(), like this:
> >>>>>
> >>>>> static void ich9_apm_broadcast_smi(void)
> >>>>> {
> >>>>>     CPUState *cs;
> >>>>>
> >>>>>     pause_all_vcpus();
> >>>>>     cpu_synchronize_all_states();
> >>>>>     CPU_FOREACH(cs) {
> >>>>>         X86CPU *cpu = X86_CPU(cs);
> >>>>>         CPUX86State *env = &cpu->env;
> >>>>>
> >>>>>         if (env->smbase == 0x30000 && env->eip == 0xfff0) {
> >>>>>             CPUClass *k = CPU_GET_CLASS(cs);
> >>>>>             uint64_t cpu_arch_id = k->get_arch_id(cs);
> >>>>>
> >>>>>             trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
> >>>>>             continue;
> >>>>>         }
> >>>>>
> >>>>>         cpu_interrupt(cs, CPU_INTERRUPT_SMI);
> >>>>>     }
> >>>>>     resume_all_vcpus();
> >>>>> }
> >>>>>
> >>>>> then I see the following symptoms:
> >>>>> - rather than 4 VCPUs being pegged at 100%, only one VCPU is pegged at
> >>>>>   100%
> >>>>> - the boot process, as observed from the OVMF log, is just as slow
> >>>>>   (= crawling) as without the VCPU pausing/resuming (i.e., like with
> >>>>>   cpu_synchronize_all_states() only); so ultimately the
> >>>>>   pausing/resuming doesn't help.      
> >>>>
> >>>> I also tried this variant (bracketing only the sync call with 
> >>>> pause/resume):
> >>>>
> >>>> static void ich9_apm_broadcast_smi(void)
> >>>> {
> >>>>     CPUState *cs;
> >>>>
> >>>>     pause_all_vcpus();
> >>>>     cpu_synchronize_all_states();
> >>>>     resume_all_vcpus();
> >>>>     CPU_FOREACH(cs) {
> >>>>         X86CPU *cpu = X86_CPU(cs);
> >>>>         CPUX86State *env = &cpu->env;
> >>>>
> >>>>         if (env->smbase == 0x30000 && env->eip == 0xfff0) {
> >>>>             CPUClass *k = CPU_GET_CLASS(cs);
> >>>>             uint64_t cpu_arch_id = k->get_arch_id(cs);
> >>>>
> >>>>             trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
> >>>>             continue;
> >>>>         }
> >>>>
> >>>>         cpu_interrupt(cs, CPU_INTERRUPT_SMI);
> >>>>     }
> >>>> }
> >>>>
> >>>> This behaves identically to the version where pause/resume bracket the
> >>>> entire function body (i.e., 1 VCPU pegged, super-slow boot progress).    
> >>> wrapping entire function into pause_all_vcpus()/resume_all_vcpus()
> >>> looks better as then cpu_interrupt() would not need to send IPIs to CPUs
> >>> since pause_all_vcpus() would have done it.  
> >>
> >> I don't understand, why would a pause operation send IPIs?  
> > pause_all forces exit on all CPUs and cpu_interrupt() does the same to
> > a CPU so if all CPUs were kicked out of running state then cpu_interrupt()
> > would skip kick out step.  
> 
> Understood.
> 
> >   
> >> Do you mean the resume? Even in that case, why would the resume send
> >> precisely SMIs (substituting for the specific CPU_INTERRUPT_SMI calls)
> >> and not some other kind of interrupt?
> >>  
> >>> So the left question is what this 1 CPU does to make things so slow.
> >>> Does it send a lot of SMIs or something else?    
> >>
> >> The firmware uses many SMIs / Trigger() calls, for example for the
> >> implementation of UEFI variable services, and SMM lockbox operations.
> >> However, it does not use *masses* of them. Following the OVMF log, I see
> >> that *individual* APM_CNT writes take very long (on the order of a
> >> second, even), which implies that a single cpu_synchronize_all_states()
> >> function call takes very long.
> >>
> >> I briefly checked what cpu_synchronize_all_states() does, digging down
> >> its call stack. I don't see anything in it that we should or could
> >> modify for the sake of this work.
> >>
> >>
> >> I think the current line of investigation is way out of scope for this
> >> work, and that we've lost focus. In the original "PATCH v6 wave 2 2/3",
> >> cpu_interrupt() just works as intended.  
> 
> > It's not that I'm outright against of v6 as is,
> > it just seems that something fundamentally broken on
> > cpu_synchronize_all_states() call chain  
> 
> Maybe, but I don't see how fixing that is a prerequisite for this patch.
> 
> > and instead of fixing issue
> > the same SMBASE race case would be just ignored  
> 
> I don't understand. What SMBASE race? Ignored by what and when? Sorry,
> I'm lost.
> 
> If you are saying that, for the moment we can ignore any SMBASE races
> (whatever they may be) because CPU hotplug is broken with OVMF anyway,
> then I agree. I would like to limit the scope of this series as much as
> possible.
> 
> > since CPU hotplug is
> > broken with OVMF anyways.
> > 
> > So I'm fine with applying v6 as is
> > and as follow up fix to cpu_synchronize_all_states()  
> 
> I cannot promise to work on cpu_synchronize_all_states(). It ties into
> KVM and it goes over my head.
> 
> In fact, if there is a problem with cpu_synchronize_all_states(), it
> must be a general one.
> 
> The only reason I even thought of calling cpu_synchronize_all_states()
> here, after seeing the out-of-date register values, is that I recalled
> it from when I worked on dump-guest-memory. dump-guest-memory writes the
> register file to the dump, and so it needs cpu_synchronize_all_states().
> 
> But, again, cpu_synchronize_all_states() is a general facility, and I
> can't promise to work on it.
> 
> > with
> > follow up filtering out of the same SMBASE in reset state CPUs
> > if possible in 2.9 merge window.  
> 
> I'm still not convinced that this filtering is absolutely necessary, in
> the face of feature negotiation; *but*, if someone figures out what's
> wrong with cpu_synchronize_all_states(), and manages to make its
> performance impact negligible, then I am certainly willing to add the
> filtering to the SMI broadcast.
> 
> At that point, it shouldn't be much work, I hope.
> 
> >   
> >> Why are we branching out this far from that? Just to avoid the CPU
> >> unpark feature that Paolo suggested (which would also be negotiated)?
> >>
> >> What is so wrong with the planned CPU unpark stuff that justifies
> >> blocking this patch?
> >>
> >> Honestly I don't understand why we can't approach these features
> >> *incrementally*. Let's do the negotiable SMI broadcast now, then do VCPU
> >> hotplug later. I think the current negotiation pattern is flexible and
> >> future-proofed enough for that. It was you who suggested, for
> >> simplifying the last patch in the series, that we completely ignore VCPU
> >> hotplug for now (and I whole-heartedly agreed).
> >>
> >> Let me put it like this: what is in this patch, in your opinion, that
> >> *irrevocably* breaks the upcoming VCPU hotplug feature, or else limits
> >> our options in implementing it? In my opinion, we can later implement
> >> *anything at all*, dependent on new feature bits. We can even decide to
> >> reject guest feature requests that specify *only* bit#0; which
> >> practically means disabling guests (using SMIs) that don't conform to
> >> our VCPU hotlpug implementation.  
> 
> > Why do negotiation if hotplug could be done without it,  
> 
> Because it lets us segment the feature set into several small steps,
> where I have a chance of grasping the small pieces and maybe even
> contributing small, surgical patches?
> 
> You and Paolo might be seeing the big picture, and I certainly don't
> want to go *against* the big picture. I just want to advance with as
> little steps as possible.
> 
> There's the historical observation that a compiler has as many stages as
> there are sub-teams in the compiler team. (Please excuse me for not
> recalling the exact phrasing and the person who quipped it.) That
> observation exists for a reason. Feature and module boundaries tend to
> mirror human understanding.
> 
> > on hw(QEMU)
> > side hotplug seems to be implemented sufficiently (but we still missing
> > SMRR support). The only thing to make hotplug work with OVMF might be
> > issuing SMI either directly from QEMU or from AML (write into APM_CNT)
> > after CPU is hotplugged.  
> 
> Maybe. For me, that looms in the future.
> 
> If you agree with my participation as outlined above; that is,
> - I care about this exact patch, as posted,
> - someone else looks into cpu_synchronize_all_states(),
CCing Radim who graciously agreed to take a look what wrong from KVM side,
Could you give him steps to reproduce issue, pls.

> - and then I'm willing to care about the incremental patch for the
>   filtering,
ok


> then I propose we go ahead with this patch. It's the last one in the
> series that needs your R-b.
Reviewed-by: Igor Mammedov <address@hidden>

> 
> Thanks
> Laszlo
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, (continued)
- Re: [Qemu-devel] [PATCH v6 wave 2 0/3] q35: add negotiable broadcast SMI, Michael S. Tsirkin, 2017/01/18
Prev by Date: Re: [Qemu-devel] [PATCH] numa: access CPU's node id via property in hmp_info_numa()
Next by Date: Re: [Qemu-devel] [PATCH v1 11/15] qcow2: convert QCow2 to use QCryptoBlock for encryption
Previous by thread: Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature
Next by thread: Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature
Index(es):
- Date
- Thread