qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v4] blockjob: Fix crash with IOthread when block commit after


From: 仇大玉
Subject: RE: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot
Date: Thu, 28 Jan 2021 05:16:09 +0000

Any comments?

-----Original Message-----
From: 08005325@163.com <08005325@163.com> 
Sent: 2021年1月28日 9:31
To: kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com
Cc: qemu-block@nongnu.org; qemu-devel@nongnu.org; 仇大玉 <qiudayu@huayun.com>
Subject: [PATCH v4] blockjob: Fix crash with IOthread when block commit after 
snapshot

From: Michael Qiu <qiudayu@huayun.com>

v4: rebase to latest code

v3: reformat the commit log, remove duplicate content

v2: modify the coredump backtrace within commit log with the newest
    qemu with master branch

Currently, if guest has workloads, IO thread will acquire aio_context lock 
before do io_submit, it leads to segmentfault when do block commit after 
snapshot. Just like below:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in 
bdrv_mirror_top_pwritev at ../block/mirror.c:1437
1437    ../block/mirror.c: No such file or directory.
(gdb) p s->job
$17 = (MirrorBlockJob *) 0x0
(gdb) p s->stop
$18 = false

(gdb) bt

Switch to qemu main thread:
/lib/../lib64/libpthread.so.0
/lib/../lib64/libpthread.so.0
../util/qemu-thread-posix.c:79
qapi/qapi-commands-block-core.c:346
../qapi/qmp-dispatch.c:110
/lib/../lib64/libglib-2.0.so.0

In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field 
is false, this means the MirrorBDSOpaque "s" object has not been initialized 
yet, and this object is initialized by block_job_create(), but the initialize 
process is stuck in acquiring the lock.

The rootcause is that qemu do release/acquire when hold the lock, at the same 
time, IO thread get the lock after release stage, and the crash occured.

Actually, in this situation, job->job.aio_context will not equal to 
qemu_get_aio_context(), and will be the same as bs->aio_context, thus, no need 
to release the lock, becasue bdrv_root_attach_child() will not change the 
context.

This patch fix this issue.

Signed-off-by: Michael Qiu <qiudayu@huayun.com>
---
 blockjob.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 98ac8af982..51a09f3b60 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *name, 
BlockDriverState *bs,
     BdrvChild *c;
 
     bdrv_ref(bs);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_release(job->job.aio_context);
     }
     c = bdrv_root_attach_child(bs, name, &child_job, 0,
                                job->job.aio_context, perm, shared_perm, job,
                                errp);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_acquire(job->job.aio_context);
     }
     if (c == NULL) {
--
2.22.0



reply via email to

[Prev in Thread] Current Thread [Next in Thread]