qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] qcow2: always initialize specific image info


From: Max Reitz
Subject: Re: [Qemu-block] [PATCH] qcow2: always initialize specific image info
Date: Tue, 22 Dec 2015 00:05:26 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 21.12.2015 22:33, Denis V. Lunev wrote:
> On 12/21/2015 04:18 PM, Denis V. Lunev wrote:
>> On 12/07/2015 08:07 PM, Roman Kagan wrote:
>>> qcow2_get_specific_info() used to have a code path which would leave
>>> pointer to ImageInfoSpecificQCow2 uninitialized.
>>>
>>> We guess that it caused sporadic crashes on freeing an invalid pointer
>>> in response to "query-block" QMP command in
>>> visit_type_ImageInfoSpecificQCow2 with QapiDeallocVisitor.
>>>
>>> Although we have neither a solid proof nor a reproduction scenario,
>>> making sure the field is initialized appears a reasonable thing to do.
>>>
>>> Signed-off-by: Roman Kagan <address@hidden>
>>> ---
>>>   block/qcow2.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 88f56c8..67c9d3d 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -2739,7 +2739,7 @@ static ImageInfoSpecific
>>> *qcow2_get_specific_info(BlockDriverState *bs)
>>>         *spec_info = (ImageInfoSpecific){
>>>           .type  = IMAGE_INFO_SPECIFIC_KIND_QCOW2,
>>> -        .u.qcow2 = g_new(ImageInfoSpecificQCow2, 1),
>>> +        .u.qcow2 = g_new0(ImageInfoSpecificQCow2, 1),
>>>       };
>>>       if (s->qcow_version == 2) {
>>>           *spec_info->u.qcow2 = (ImageInfoSpecificQCow2){
>>
>> It looks like I have caught this (on QEMU 2.3 RHEV) got from here
>> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.3.0-31.el7_2.3.src.rpm
>>
>>
>> Here is a dump
>>
>>
>> (gdb) p *blk_backends.tqh_first->bs
>> $12 = {total_sectors = 134217728, read_only = 0, open_flags = 24802,
>>   encrypted = 0, valid_key = 0, sg = 0, copy_on_read = 0, probed = false,
>>   drv = 0x7fe0c047ec40 <bdrv_qcow2>, opaque = 0x7fe0c23acdb0,
>>   blk = 0x7fe0c23a4640, aio_context = 0x7fe0c237f350, aio_notifiers = {
>>     lh_first = 0x0},
>>   filename = "/vz/vmprivate/VM_a80bf3e5.pvm/harddisk.hdd", '\000'
>> <repeats 4053 times>, backing_file = '\000' <repeats 4095 times>,
>>   backing_format = '\000' <repeats 15 times>,
>>   full_open_options = 0x7fe0c2513e30,
>>   exact_filename = "/vz/vmprivate/VM_a80bf3e5.pvm/harddisk.hdd",
>> '\000' <repeats 4053 times>, backing_hd = 0x0, file = 0x7fe0c23a9a50,
>> close_notifiers = {
>>     notifiers = {lh_first = 0x0}}, before_write_notifiers = {notifiers
>> = {
>>       lh_first = 0x0}}, serialising_in_flight = 0, throttle_state =
>> {cfg = {
>>       buckets = {{avg = 0, max = 0, level = 0}, {avg = 0, max = 0,
>> level = 0},
>>         {avg = 0, max = 0, level = 0}, {avg = 0, max = 0, level = 0}, {
>>           avg = 0, max = 0, level = 0}, {avg = 0, max = 0, level = 0}},
>>       op_size = 0}, previous_leak = 0, timers = {0x0, 0x0},
>>     clock_type = QEMU_CLOCK_REALTIME, read_timer_cb = 0x0,
>>     write_timer_cb = 0x0, timer_opaque = 0x0}, throttled_reqs =
>> {{entries = {
>>         tqh_first = 0x0, tqh_last = 0x7fe0c23a7810}}, {entries = {
>>         tqh_first = 0x0, tqh_last = 0x7fe0c23a7820}}},
>>   io_limits_enabled = false, stats = {nr_bytes = {0, 0, 0}, nr_ops =
>> {0, 0,
>>       0}, total_time_ns = {0, 0, 0}, merged = {0, 0, 0},
>>     wr_highest_sector = 0}, bl = {max_discard = 0, discard_alignment = 0,
>>     max_write_zeroes = 0, write_zeroes_alignment = 128,
>>     opt_transfer_length = 0, max_transfer_length = 0,
>>     opt_mem_alignment = 512}, zero_beyond_eof = true,
>> request_alignment = 512,
>>   guest_block_size = 512, enable_write_cache = 1,
>>   on_read_error = BLOCKDEV_ON_ERROR_REPORT,
>>   on_write_error = BLOCKDEV_ON_ERROR_ENOSPC, iostatus_enabled = true,
>>   iostatus = BLOCK_DEVICE_IO_STATUS_OK, node_name = '\000' <repeats 31
>> times>,
>>   node_list = {tqe_next = 0x0, tqe_prev = 0x0}, device_list = {
>>     tqe_next = 0x7fe0c2514e50, tqe_prev = 0x7fe0c047cff0 <bdrv_states>},
>>   dirty_bitmaps = {lh_first = 0x0}, refcnt = 1, tracked_requests = {
>>     lh_first = 0x0}, op_blockers = {{lh_first = 0x0} <repeats 15 times>},
>>   job = 0x0, options = 0x7fe0c23a2ca0,
>>   detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, backing_blocker
>> = 0x0,
>>   write_threshold_offset = 0, write_threshold_notifier = {notify = 0x0,
>>     node = {le_next = 0x0, le_prev = 0x0}}}
>>
>>
>> (gdb) p *(BDRVQcowState*)blk_backends.tqh_first->bs->opaque
>> $13 = {cluster_bits = 0, cluster_size = 0, cluster_sectors = 0,
>> l2_bits = 0,
>>   l2_size = 0, l1_size = 0, cluster_offset_mask = 0, l1_table_offset = 0,
>>   l1_table = 0x0, l2_cache = 0x0, l2_cache_offsets = {0 <repeats 16
>> times>},
>>   l2_cache_counts = {0 <repeats 16 times>}, cluster_cache = 0x0,
>>   cluster_data = 0x0, cluster_cache_offset = 0, crypt_method = 0,
>>   crypt_method_header = 0, aes_encrypt_key = {rd_key = {0 <repeats 60
>> times>},
>>     rounds = 0}, aes_decrypt_key = {rd_key = {0 <repeats 60 times>},
>>     rounds = 0}, lock = {locked = false, queue = {entries = {tqh_first
>> = 0x0,
>>         tqh_last = 0x0}}}, migration_blocker = 0x0}
>> (gdb)
>>
>>
>> you see, all parameters are zeroes.
>>
>> Technically this could happen in the middle of the qcow2_invalidate_cache
>> which, in turn is called from migration code f.e. during resuming from
>> suspend (which is how this thing is implemented in libvirt).
>>
>> Migration code is running actually at the moment as far as could be seen
>> in coredump.
>>
>> Any thoughts? :)
>>
>> Den
>>
> The situation is even worse than I have feared.
> 
> 1) the hypothesis is correct. We are in the middle of the
> qcow2_invalidate_cache
>     which is called from here
> static void process_incoming_migration_co(void *opaque)
> {
>      ...
>     /* Make sure all file formats flush their mutable metadata */
>     bdrv_invalidate_cache_all(&local_err);
> 
> 2) Though the code is problematic, coroutine context is switched inside,
>     there are a lot of places doing IO inside.
> 
> 3) Incoming request for qmp_query_block is called, it is safely performed
>     from monitor context
> 
> diff --git a/block/qapi.c b/block/qapi.c
> index 8a19aed..8a07d75 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -368,7 +368,12 @@ BlockInfoList *qmp_query_block(Error **errp)
> 
>      for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
>          BlockInfoList *info = g_malloc0(sizeof(*info));
> +        AioContext *ctx = blk_get_aio_context(blk);
> +
> +        aio_context_acquire(ctx);
>          bdrv_query_info(blk, &info->value, &local_err);
> +        aio_context_release(ctx);
> +
>          if (local_err) {
>              error_propagate(errp, local_err);
>              goto err;
> 
> will not help us as the thread is the same! We will increase
> nesting in rfifolock_lock() and proceed inside to crash.
> 
> 4) locking with coroutine lock in qcow2_get_specific_info
>     will not help too - it is initialized in qcow2_open inside
>     invalidate cache and is taken/dropped too several
>     time
> 
> :(((

Thanks for finding this!

Well, if qcow2_get_specific_info() is the only place that can actually
cause issues in that case (i.e. calling some QMP function which uses the
qcow2 image while the incoming_migration coroutine yields before the
image has been fully reopened), I think the simplest way to fix this
would be to just return NULL from qcow2_get_specific_info() in the else
branch (which currently aborts), adding a comment how we can end up there.

However, it seems hard to believe this is the only problematic path...
If the coroutine can yield between the BDRVQcow2State getting memset()
to 0 and qcow2_open() having initialized it again, then any QMP command
which makes use of the qcow2 image should fail (not necessarily
gracefully) at that point.

I feel like qcow2_invalidate_cache() is the root cause. Basically
completely reopening the qcow2 image just feels wrong. Maybe we could at
least atomically replace the BDRVQcow2State, i.e. call qcow2_open()
first (but with a new BDRVQcow2State), then swap out the BDRVQcow2State
in the BDS, and then call qcow2_close() on the old BDRVQcow2State. That
doesn't feel right still, but it should fix this issue, at least.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]