qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle


From: Peter Xu
Subject: Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
Date: Thu, 20 Jan 2022 17:25:51 +0800

On Thu, Jan 20, 2022 at 04:26:09PM +0800, Hyman Huang wrote:
> Hiļ¼ŒPeter. I'm working on this problem and found the reason is kind of the
> same as i metioned in cover letter of v10, the following is what i posted:
> 
>   2. The new implementaion of throttle algo enlightened by Peter
>      responds faster and consume less cpu resource than the older,
>      we make a impressed progress.
> 
>      And there is a viewpoint may be discussed, it is that the new
>      throttle logic is "passive", vcpu sleeps only after dirty ring,
>      is full, unlike the "auto-converge" which will kick vcpu instead
>      in a fixed slice time. If the vcpu is memory-write intensive
>      and the ring size is large, it will produce dirty memory during
>      the dirty ring full time and the throttle works not so good, it
>      means the throttle depends on the dirty ring size.
> 
>      I actually tested the new algo in two case:
> 
>      case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s
>      result: minimum quota dirtyrate is 25MB/s or even less
>              minimum vcpu util is 6%
> 
>      case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s
>      result: minimum quota dirtyrate is 256MB/s
>              minimum vcpu util is 24%
> 
>      I post this just for discussion, i think this is not a big deal
>      beacase if we set the dirty-ring-size to the maximum value(65536),
>      we assume the server's bandwidth is capable of handling it.

My memory is that I tested your v10 (which has this wait-at-ring-full logic)
already and at that time it worked well.

It's possible that I just got lucky with v10, so that can happen with some
random conditions and so far we still don't know how to hit it.

However..

> 
> Currently, Qemu handle the vcpu KVM_EXIT_DIRTY_RING_FULL exit as the
> following:
> 
> 1. If one of the dirty-ring on a vcpu is full, vcpu thread returns to user
> space and qemu handle it.
> 
> 2. Qemu get the kvm_slots_lock and reap dirty-ring of all vcpus once for all
> by calling kvm_dirty_ring_reap, fill the dirty page bitmap of slot and reset
> dirty ring. Release the kvm_slots_lock finally.
> 
> The logic *reap and reset dirty ring of all vcpu after one vcpu's dirty ring
> is full* works fine and efficiently.
> 
> But this is not what dirtylimit want becasue some of the vcpu may loss
> chance to sleep and could not be throttled, though vcpu's dirty ring was
> full.
> 
> The latest test environment of you used a larger guest(20G, 40 cores),
> increasing the chances of missing sleep for vcpus and the throttle works not
> good as before.
> 
> I try a simple modification make the throttle works better as before:
> 
> +static void kvm_dirty_ring_reset_one(KVMState *s, CPUState *cpu)
> +{
> +    int ret;
> +    uint64_t total = 0;
> +
> +    kvm_slots_lock();
> +    total = kvm_dirty_ring_reap_one(s, cpu);
> +
> +    if (total) {
> +        ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS);
> +        assert(ret == total);
> +    }
> +
> +    kvm_slots_unlock();
> +}
> +
>  static void do_kvm_cpu_synchronize_kick(CPUState *cpu, run_on_cpu_data arg)
>  {
>      /* No need to do anything */
> @@ -2309,6 +2327,11 @@ bool kvm_dirty_ring_enabled(void)
>      return kvm_state->kvm_dirty_ring_size ? true : false;
>  }
> 
>  static int kvm_init(MachineState *ms)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(ms);
> @@ -2955,9 +2978,8 @@ int kvm_cpu_exec(CPUState *cpu)
>               * still full.  Got kicked by KVM_RESET_DIRTY_RINGS.
>               */
>              trace_kvm_dirty_ring_full(cpu->cpu_index);
> -            qemu_mutex_lock_iothread();
> -            kvm_dirty_ring_reap(kvm_state);
> -            qemu_mutex_unlock_iothread();
> +            kvm_dirty_ring_reset_one(kvm_state, cpu);
> +            dirtylimit_vcpu_execute(cpu);
>              ret = 0;
>              break;
>          case KVM_EXIT_SYSTEM_EVENT:
> 
> I drop the BQL to reduce the overhead of KVM_EXIT_DIRTY_RING_FULL exit
> handle. May be kvm_state could be protected by BQL, but i wonder if there
> can be a finer granularity lock.
> 
> How about this?

... I think what you explained makes sense to me.

Note that there's also the reaper thread running in the background that can
reap all the cores too.

It only runs once per second so it shouldn't bring a lot of differences, but
I'm also wondering whether we should also turn that temporarily off too when
dirtylimit is enabled - we can simply let it keep sleeping if dirtylimit is in
service.

Dropping BQL may not be safe, as it serializes the reaping with other possible
kvm memslot updates.  I don't know whether it's a must in the future to use BQL
for reaping the rings, but so far I'd say we can still stick with it.

Note that even if you don't take BQL you'll still need the slots_lock and so
far that's also global, so I don't see how it can help on vcpu concurrency
anyway even if we dropped one of them.

If to do this, could you not introduce kvm_dirty_ring_reset_one() but just let
it take one more CPUState* parameter?  Most of the codes you added should be
similar to kvm_dirty_ring_reap_locked(), and I wanted to keep the trace point
there (trace_kvm_dirty_ring_reap, though that needs another parameter too).

And that patch can be done on top of this patch, so it can be reviewed easier
outside of dirtylimit details.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]