[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through s
From: |
Max Reitz |
Subject: |
Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage |
Date: |
Wed, 28 Mar 2018 16:53:57 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 |
On 2018-03-27 12:11, Vladimir Sementsov-Ogievskiy wrote:
> 27.03.2018 12:53, Vladimir Sementsov-Ogievskiy wrote:
>> 27.03.2018 12:28, Vladimir Sementsov-Ogievskiy wrote:
>>> 26.03.2018 21:06, Max Reitz wrote:
>>>> On 2018-03-20 18:05, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Hi all.
>>>>>
>>>>> This fixes bitmaps migration through shared storage. Look at 02 for
>>>>> details.
>>>>>
>>>>> The bug introduced in 2.10 with the whole qcow2 bitmaps feature, so
>>>>> qemu-stable in CC. However I doubt that someone really suffered
>>>>> from this.
>>>>>
>>>>> Do we need dirty bitmaps at all in inactive case? - that was a
>>>>> question in v2.
>>>>> And, keeping in mind that we are going to use inactive mode not
>>>>> only for
>>>>> incoming migration, I'm not sure that answer is NO (but, it may be
>>>>> "NO" for
>>>>> 2.10, 2.11), so let's fix it in proposed here manner at least for
>>>>> 2.12.
>>>> For some reason, I can't get 169 to work now at all[1]. What's more,
>>>> whenever I run it, two (on current master, maybe more after this
>>>> series)
>>>> "cat $TEST_DIR/mig_file" processes stay around. That doesn't seem
>>>> right.
>>>>
>>>> However, this series doesn't seem to make it worse[2]... So I'm
>>>> keeping
>>>> it. I suppose it's just some issue with the test.
>>>>
>>>> Max
>>>>
>>>>
>>>> [1] Sometimes there are migration even timeouts, sometimes just VM
>>>> launch timeouts (specifically when VM B is supposed to be re-launched
>>>> just after it has been shut down), and sometimes I get a dirty bitmap
>>>> hash mismatch.
>>>>
>>>>
>>>> [2] The whole timeline was:
>>>>
>>>> - Apply this series, everything seems alright
>>>>
>>>> (a couple of hours later)
>>>> - Test some other things, stumble over 169 once or so
>>>>
>>>> - Focus on 169, fails a bit more often
>>>>
>>>> (today)
>>>> - Can't get it to work at all
>>>>
>>>> - Can't get it to work in any version, neither before nor after this
>>>> patch
>>>>
>>>> - Lose my sanity
>>>>
>>>> - Write this email
>>>>
>>>> O:-)
>>>>
>>>
>>> hmm.. checked on current master (7b93d78a04aa24), tried a lot of
>>> times in a loop, works for me. How can I help?
>>>
>>
>> O, loop finally finished, with:
>>
>> 169 6s ... [failed, exit status 1] - output mismatch (see 169.out.bad)
>> --- /work/src/qemu/master/tests/qemu-iotests/169.out 2018-03-16
>> 21:01:19.536765587 +0300
>> +++ /work/src/qemu/master/tests/qemu-iotests/169.out.bad 2018-03-27
>> 12:33:03.804800350 +0300
>> @@ -1,5 +1,20 @@
>> -........
>> +......E.
>> +======================================================================
>> +ERROR: test__persistent__not_migbitmap__offline
>> (__main__.TestDirtyBitmapMigration)
>> +methodcaller(name, ...) --> methodcaller object
>> +----------------------------------------------------------------------
>> +Traceback (most recent call last):
>> + File "169", line 129, in do_test_migration
>> + self.vm_b.event_wait("RESUME", timeout=10.0)
>> + File
>> "/work/src/qemu/master/tests/qemu-iotests/../../scripts/qemu.py", line
>> 349, in event_wait
>> + event = self._qmp.pull_event(wait=timeout)
>> + File
>> "/work/src/qemu/master/tests/qemu-iotests/../../scripts/qmp/qmp.py",
>> line 216, in pull_event
>> + self.__get_events(wait)
>> + File
>> "/work/src/qemu/master/tests/qemu-iotests/../../scripts/qmp/qmp.py",
>> line 124, in __get_events
>> + raise QMPTimeoutError("Timeout waiting for event")
>> +QMPTimeoutError: Timeout waiting for event
>> +
>> ----------------------------------------------------------------------
>> Ran 8 tests
>>
>> -OK
>> +FAILED (errors=1)
>> Failures: 169
>> Failed 1 of 1 tests
>>
>>
>> and I have a lot of opened pipes, like:
>>
>> root 18685 0.0 0.0 107924 352 pts/0 S 12:19 0:00 cat
>> /work/src/qemu/master/tests/qemu-iotests/scratch/mig_file
>>
>> ...
>>
>> restart testing loop, it continues to pass 169 again and again...
>>
>
> .... and,
>
> --- /work/src/qemu/master/tests/qemu-iotests/169.out 2018-03-16
> 21:01:19.536765587 +0300
> +++ /work/src/qemu/master/tests/qemu-iotests/169.out.bad 2018-03-27
> 12:58:44.804894014 +0300
> @@ -1,5 +1,20 @@
> -........
> +F.......
> +======================================================================
> +FAIL: test__not_persistent__migbitmap__offline
> (__main__.TestDirtyBitmapMigration)
> +methodcaller(name, ...) --> methodcaller object
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> + File "169", line 136, in do_test_migration
> + self.check_bitmap(self.vm_b, sha256 if persistent else False)
> + File "169", line 77, in check_bitmap
> + "Dirty bitmap 'bitmap0' not found");
> + File "/work/src/qemu/master/tests/qemu-iotests/iotests.py", line 422,
> in assert_qmp
> + result = self.dictpath(d, path)
> + File "/work/src/qemu/master/tests/qemu-iotests/iotests.py", line 381,
> in dictpath
> + self.fail('failed path traversal for "%s" in "%s"' % (path, str(d)))
> +AssertionError: failed path traversal for "error/desc" in "{u'return':
> {u'sha256':
> u'01d2ebedcb8f549a2547dbf8e231c410e3e747a9479e98909fc936e0035cf8b1'}}"
> +
> ----------------------------------------------------------------------
> Ran 8 tests
>
> -OK
> +FAILED (failures=1)
> Failures: 169
> Failed 1 of 1 tests
>
>
> isn't it because a lot of cat processes? will check, update loop to
> i=0; while check -qcow2 169; do ((i++)); echo $i OK; killall -9 cat; done
Hmm... I know I tried to kill all of the cats, but for some reason that
didn't really help yesterday. Seems to help now, for 2.12.0-rc0 at
least (that is, before this series).
After the whole series, I still get a lot of failures in 169
(mismatching bitmap hash, mostly).
And interestingly, if I add an abort():
diff --git a/block/qcow2.c b/block/qcow2.c
index 486f3e83b7..9204c1c0ac 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1481,6 +1481,7 @@ static int coroutine_fn
qcow2_do_open(BlockDriverState *bs, QDict *options, }
if (bdrv_dirty_bitmap_next(bs, NULL)) {
+ abort();
/* It's some kind of reopen with already existing dirty
bitmaps. There
* are no known cases where we need loading bitmaps in such
situation,
* so it's safer don't load them.
Then this fires for a couple of test cases of 169 even without the third
patch of this series.
I guess bdrv_dirty_bitmap_next() reacts to some bitmaps that migration
adds or something? Then this would be the wrong condition, because I
guess we still want to load the bitmaps that are in the qcow2 file.
I'm not sure whether bdrv_has_readonly_bitmaps() is the correct
condition then, either, though. Maybe let's take a step back: We want
to load all the bitmaps from the file exactly once, and that is when it
is opened the first time. Or that's what I would have thought... Is
that even correct?
Why do we load the bitmaps when the device is inactive anyway?
Shouldn't we load them only once the device is activated?
Max
signature.asc
Description: OpenPGP digital signature
- [Qemu-block] [PATCH v4 3/3] iotests: enable shared migration cases in 169, (continued)
- [Qemu-block] [PATCH v4 3/3] iotests: enable shared migration cases in 169, Vladimir Sementsov-Ogievskiy, 2018/03/20
- [Qemu-block] [PATCH v4 2/3] qcow2: handle reopening bitmaps on bdrv_invalidate_cache, Vladimir Sementsov-Ogievskiy, 2018/03/20
- [Qemu-block] [PATCH v4 2/3] qcow2: fix bitmaps loading when bitmaps already exist, Vladimir Sementsov-Ogievskiy, 2018/03/20
- [Qemu-block] [PATCH v4 1/3] qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint(), Vladimir Sementsov-Ogievskiy, 2018/03/20
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Max Reitz, 2018/03/21
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Max Reitz, 2018/03/26
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/27
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/27
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/27
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage,
Max Reitz <=
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/29
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Max Reitz, 2018/03/29
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/29
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/30
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/30