Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work

From:	Alex Bennée
Subject:	Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work
Date:	Mon, 24 Jun 2019 11:58:23 +0100
User-agent:	mu4e 1.3.2; emacs 26.1

Roman Kagan <address@hidden> writes:

> It was introduced in commit b129972c8b41e15b0521895a46fd9c752b68a5e,
> with the following motivation:

I can't find this commit in my tree.

>
>   Because start_exclusive uses CPU_FOREACH, merge exclusive_lock with
>   qemu_cpu_list_lock: together with a call to exclusive_idle (via
>   cpu_exec_start/end) in cpu_list_add, this protects exclusive work
>   against concurrent CPU addition and removal.
>
> However, it seems to be redundant, because the cpu-exclusive
> infrastructure provides suffificent protection against the newly added
> CPU starting execution while the cpu-exclusive work is running, and the
> aforementioned traversing of the cpu list is protected by
> qemu_cpu_list_lock.
>
> Besides, this appears to be the only place where the cpu-exclusive
> section is entered with the BQL taken, which has been found to trigger
> AB-BA deadlock as follows:
>
>     vCPU thread                             main thread
>     -----------                             -----------
> async_safe_run_on_cpu(self,
>                       async_synic_update)
> ...                                         [cpu hot-add]
> process_queued_cpu_work()
>   qemu_mutex_unlock_iothread()
>                                             [grab BQL]
>   start_exclusive()                         cpu_list_add()
>   async_synic_update()                        finish_safe_work()
>     qemu_mutex_lock_iothread()                  cpu_exec_start()
>
> So remove it.  This paves the way to establishing a strict nesting rule
> of never entering the exclusive section with the BQL taken.
>
> Signed-off-by: Roman Kagan <address@hidden>
> ---
>  cpus-common.c | 8 --------
>  1 file changed, 8 deletions(-)
>
> diff --git a/cpus-common.c b/cpus-common.c
> index 3ca58c64e8..023cfebfa3 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -69,12 +69,6 @@ static int cpu_get_free_index(void)
>      return cpu_index;
>  }
>
> -static void finish_safe_work(CPUState *cpu)
> -{
> -    cpu_exec_start(cpu);
> -    cpu_exec_end(cpu);
> -}
> -

This makes sense to me intellectually but I'm worried I've missed the
reason for it being introduced. Without finish_safe_work we have to wait
for the actual vCPU thread function to acquire and release the BQL and
enter it's first cpu_exec_start().

I guess I'd be happier if we had a hotplug test where we could stress
test the operation and be sure we've not just moved the deadlock
somewhere else.

>  void cpu_list_add(CPUState *cpu)
>  {
>      qemu_mutex_lock(&qemu_cpu_list_lock);
> @@ -86,8 +80,6 @@ void cpu_list_add(CPUState *cpu)
>      }
>      QTAILQ_INSERT_TAIL_RCU(&cpus, cpu, node);
>      qemu_mutex_unlock(&qemu_cpu_list_lock);
> -
> -    finish_safe_work(cpu);
>  }
>
>  void cpu_list_remove(CPUState *cpu)


--
Alex Bennée

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work, Alex Bennée <=
- Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work, Roman Kagan, 2019/06/24
  - Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work, Alex Bennée, 2019/06/24

Prev by Date: Re: [Qemu-devel] [libvirt] [PATCH] deprecate -mem-path fallback to anonymous RAM
Next by Date: Re: [Qemu-devel] [PULL 03/25] i386/kvm: convert hyperv enlightenments properties from bools to bits
Previous by thread: Re: [Qemu-devel] ?==?utf-8?q? [PATCH v3 0/8] target/ppc: Optimize emulation of some Altivec
Next by thread: Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work
Index(es):
- Date
- Thread