qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)


From: Alberto Garcia
Subject: Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)
Date: Wed, 15 Nov 2017 16:42:56 +0100
User-agent: Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu)

On Fri 10 Nov 2017 04:02:23 AM CET, Fam Zheng wrote:
>> > I'm thinking that perhaps we should add the pause point directly to
>> > block_job_defer_to_main_loop(), to prevent any block job from
>> > running the exit function when it's paused.
>> 
>> I was trying this and unfortunately this breaks the mirror job at
>> least (reproduced with a simple block-commit on the topmost node,
>> which uses commit_active_start() -> mirror_start_job()).
>> 
>> So what happens is that mirror_run() always calls
>> bdrv_drained_begin() before returning, pausing the block job. The
>> closing bdrv_drained_end() call is at the end of mirror_exit(),
>> already in the main loop.
>> 
>> So the mirror job is always calling block_job_defer_to_main_loop()
>> and mirror_exit() when the job is paused.
>
> FWIW, I think Max's report on 194 failures is related:
>
> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01822.html
>
> so perhaps it's worth testing his patch too:
>
> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01835.html

Well, that doesn't solve the original crash with parallel block jobs.
The root of the crash is that the mirror job manipulates the graph
_while being paused_, so the BlockReopenQueue in bdrv_reopen_multiple()
gets messed up, and pausing the jobs (commit 40840e419be31e) won't help.

I have the impression that one major source of headaches is the fact
that the reopen queue contains nodes that don't need to be reopened at
all. Ideally this should be detected early on in bdrv_reopen_queue(), so
there's no chance that the queue contains nodes used by a different
block job. If we had that then op blockers should be enough to prevent
these things. Or am I missing something?

Berto



reply via email to

[Prev in Thread] Current Thread [Next in Thread]