[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all
From: |
Alberto Garcia |
Subject: |
Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs |
Date: |
Thu, 30 Nov 2017 15:35:11 +0100 |
User-agent: |
Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu) |
On Thu 30 Nov 2017 01:27:32 PM CET, Kevin Wolf wrote:
>> Destroying a paused block job during bdrv_reopen_multiple() has two
>> consequences:
>>
>> 1) The references to the nodes involved in the job are released,
>> possibly destroying some of them. If those nodes were in the
>> reopen queue this would trigger the problem originally described
>> in commit 40840e419be, crashing QEMU.
>
> This specific problem could be avoided by making the BDS reference in
> the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child()
> and bdrv_unref() only at the end of bdrv_reopen_multiple().
That is correct.
>> 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
>> not be doing all necessary bdrv_parent_drained_end() calls.
>
> If I understand correctly, you don't have a reproducer here.
That's unfortunately not correct.
You can use the very test case that I mentioned in the commit message:
https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
With that one, QEMU master crashes easily because of problem (1). If I
hold strong references in the reopen queue as you mentioned, the test
case hangs because of problem (2).
> It's certainly not a full solution because keeping a reference to a
> block job does not prevent it from completing, but only from being
> freed. Most block jobs do graph modifications, including dropping the
> references to nodes, already when they complete, not only when they
> are freed.
Yes but the block job itself holds additional references (thanks to
block_job_add_bdrv()).
> Now, while the above might have sounded like we have an easy solution
> in bdrv_reopen(), I see another problem that looks quite tough:
>
> 3) bdrv_reopen_queue() is a recursive operation that involves all
> children that inherited options. If the graph changes between the
> bdrv_reopen_queue() call and the end of bdrv_reopen_multiple(), the
> calculated options (and today possibly permissions) don't actually
> match the graph any more.
>
> The requirement we really have is that between bdrv_reopen_queue()
> and bdrv_reopen_multiple() no graph changes are made. Calling
> bdrv_drain_all_begin() in bdrv_reopen_multiple() is too late.
Yes I agree that that should be the requirement.
Berto