qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for 1.7] target-i386: yield to another VCPU on P


From: Luigi Rizzo
Subject: Re: [Qemu-devel] [PATCH for 1.7] target-i386: yield to another VCPU on PAUSE
Date: Wed, 20 Nov 2013 15:15:41 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

On Wed, Nov 20, 2013 at 12:54:02PM +0100, Paolo Bonzini wrote:
> After commit b1bbfe7 (aio / timers: On timer modification, qemu_notify
> or aio_notify, 2013-08-21) FreeBSD guests report a huge slowdown.
> 
> The problem shows up as soon as FreeBSD turns out its periodic (~1 ms)
> tick, but the timers are only the trigger for a pre-existing problem.
> 
> Before the offending patch, setting a timer did a timer_settime system call.
> 
> After, setting the timer exits the event loop (which uses poll) and
> reenters it with a new deadline.  This does not cause any slowdown; the
> difference is between one system call (timer_settime and a signal
> delivery (SIGALRM) before the patch, and two system calls afterwards
> (write to a pipe or eventfd + calling poll again when re-entering the
> event loop).
> 
> Unfortunately, the exit/enter causes the main loop to grab the iothread
> lock, which in turns kicks the VCPU thread out of execution.  This
> causes TCG to execute the next VCPU in its round-robin scheduling of
> VCPUS.  When the second VCPU is mostly unused, FreeBSD runs a "pause"
> instruction in its idle loop which only burns cycles without any
> progress.  As soon as the timer tick expires, the first VCPU runs
> the interrupt handler but very soon it sets it again---and QEMU
> then goes back doing nothing in the second VCPU.
> 
> The fix is to make the pause instruction do "cpu_loop_exit".

thank you.

This fixes the slow booting problem, but the runtime performance
with my test program in the other mail thread

        "commit b1bbfe72 causes huge slowdown with no kvm"

is still about 50% than pre- commit b1bbfe7

So i am still wondering if there is a better way to deliver the
timerlist_notify() in timerlist_rearm() .

I have tried to suppress the entire timerlist_rearm() when the newly
inserted event is close to the previous one, but this does not
seem to help -- actually it is harmful, presumably because qemu_clock_warp()
is also skipped. Conversely, filtering out timerlist_notify() as in
my previous (incorrect) patch seems to speed up things considerably,
perhaps because there are other timerlist_notify() calls elsewhere
that keep the ball rolling.


Just as a side note:

I am not sure whether you were seeing use of the 'pause' instruction
in profiling, but by default the idle loop in FreeBSD uses the "acpi" method
(to enter C1 state).

Other options are:
- "mwait" (unavailable on qemu due to some missing VCPU feature);
- "hlt" (which i tried and gives the same behaviour as "acpi");
- "spin" (which indeed does use the "pause" instruction, but is not
  used unless you force it with "sysctl machdep.idle=spin").

"pause" instructions are however used within spinlocks, and when
invoking the scheduler, which happens right before and after the idle loop.

        cheers
        luigi

> Cc: Richard Henderson <address@hidden>
> Reported-by: Luigi Rizzo <address@hidden>
> Signed-off-by: Paolo Bonzini <address@hidden>
> ---
>  target-i386/helper.h      |  1 +
>  target-i386/misc_helper.c | 22 ++++++++++++++++++++--
>  target-i386/translate.c   |  5 ++++-
>  3 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/target-i386/helper.h b/target-i386/helper.h
> index d6974df..3775abe 100644
> --- a/target-i386/helper.h
> +++ b/target-i386/helper.h
> @@ -58,6 +58,7 @@ DEF_HELPER_2(sysret, void, env, int)
>  DEF_HELPER_2(hlt, void, env, int)
>  DEF_HELPER_2(monitor, void, env, tl)
>  DEF_HELPER_2(mwait, void, env, int)
> +DEF_HELPER_2(pause, void, env, int)
>  DEF_HELPER_1(debug, void, env)
>  DEF_HELPER_1(reset_rf, void, env)
>  DEF_HELPER_3(raise_interrupt, void, env, int, int)
> diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
> index 93933fd..b6307ca 100644
> --- a/target-i386/misc_helper.c
> +++ b/target-i386/misc_helper.c
> @@ -566,6 +566,15 @@ void helper_rdmsr(CPUX86State *env)
>  }
>  #endif
>  
> +static void do_pause(X86CPU *cpu)
> +{
> +    CPUX86State *env = &cpu->env;
> +
> +    /* Just let another CPU run.  */
> +    env->exception_index = EXCP_INTERRUPT;
> +    cpu_loop_exit(env);
> +}
> +
>  static void do_hlt(X86CPU *cpu)
>  {
>      CPUState *cs = CPU(cpu);
> @@ -611,13 +620,22 @@ void helper_mwait(CPUX86State *env, int next_eip_addend)
>      cs = CPU(cpu);
>      /* XXX: not complete but not completely erroneous */
>      if (cs->cpu_index != 0 || CPU_NEXT(cs) != NULL) {
> -        /* more than one CPU: do not sleep because another CPU may
> -           wake this one */
> +        do_pause(cpu);
>      } else {
>          do_hlt(cpu);
>      }
>  }
>  
> +void helper_pause(CPUX86State *env, int next_eip_addend)
> +{
> +    X86CPU *cpu = x86_env_get_cpu(env);
> +
> +    cpu_svm_check_intercept_param(env, SVM_EXIT_PAUSE, 0);
> +    env->eip += next_eip_addend;
> +
> +    do_pause(cpu);
> +}
> +
>  void helper_debug(CPUX86State *env)
>  {
>      env->exception_index = EXCP_DEBUG;
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index eb0ea93..ecf16b3 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -7224,7 +7224,10 @@ static target_ulong disas_insn(CPUX86State *env, 
> DisasContext *s,
>              goto do_xchg_reg_eax;
>          }
>          if (prefixes & PREFIX_REPZ) {
> -            gen_svm_check_intercept(s, pc_start, SVM_EXIT_PAUSE);
> +            gen_update_cc_op(s);
> +            gen_jmp_im(pc_start - s->cs_base);
> +            gen_helper_pause(cpu_env, tcg_const_i32(s->pc - pc_start));
> +            s->is_jmp = DISAS_TB_JUMP;
>          }
>          break;
>      case 0x9b: /* fwait */
> -- 
> 1.8.4.2
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]