qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration


From: Andrey Korolyov
Subject: Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Date: Wed, 16 Jul 2014 22:25:40 +0400

On Wed, Jul 16, 2014 at 5:24 PM, Andrey Korolyov <address@hidden> wrote:
> On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti <address@hidden> wrote:
>> On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
>>> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti <address@hidden> wrote:
>>> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
>>> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <address@hidden> wrote:
>>> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>>> >> >
>>> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <address@hidden>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>>> >> >>>>
>>> >> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <address@hidden>
>>> >> >>>> wrote:
>>> >> >>>>>
>>> >> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <address@hidden>
>>> >> >>>>> wrote:
>>> >> >>>>>>
>>> >> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>>> >> >>>>>>>
>>> >> >>>>>>> Hello,
>>> >> >>>>>>>
>>> >> >>>>>>> the issue is not specific to the iothread code because generic
>>> >> >>>>>>> virtio-blk also hangs up:
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> Do you know which version works well?  If you could bisect, 
>>> >> >>>>>> that'll
>>> >> >>>>>> help a lot.
>>> >> >>>>>>
>>> >> >>>>>> Thanks,
>>> >> >>>>>>                 Amit
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> Hi,
>>> >> >>>>>
>>> >> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, 
>>> >> >>>>> though
>>> >> >>>>> every step takes about 10 minutes to complete.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> Yay! It is even outside of virtio-blk.
>>> >> >>>>
>>> >> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>>> >> >>>> Author: Marcelo Tosatti <address@hidden>
>>> >> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
>>> >> >>>>
>>> >> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>>> >> >>>> calculation
>>> >> >>>>
>>> >> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec 
>>> >> >>>> calculation.
>>> >> >>>>
>>> >> >>>>     Reported-by: Marcin Gibuła <address@hidden>
>>> >> >>>>     Cc: address@hidden
>>> >> >>>>     Signed-off-by: Marcelo Tosatti <address@hidden>
>>> >> >>>>     Signed-off-by: Paolo Bonzini <address@hidden>
>>> >> >>>
>>> >> >>>
>>> >> >>> Andrey,
>>> >> >>>
>>> >> >>> Can you please provide instructions on how to create reproducible
>>> >> >>> environment?
>>> >> >>>
>>> >> >>> The following patch is equivalent to the original patch, for
>>> >> >>> the purposes of fixing the kvmclock problem.
>>> >> >>>
>>> >> >>> Perhaps it becomes easier to spot the reason for the hang you are
>>> >> >>> experiencing.
>>> >> >>>
>>> >> >>>
>>> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>>> >> >>> index 272a88a..feb5fc5 100644
>>> >> >>> --- a/hw/i386/kvm/clock.c
>>> >> >>> +++ b/hw/i386/kvm/clock.c
>>> >> >>> @@ -17,7 +17,6 @@
>>> >> >>>  #include "qemu/host-utils.h"
>>> >> >>>  #include "sysemu/sysemu.h"
>>> >> >>>  #include "sysemu/kvm.h"
>>> >> >>> -#include "sysemu/cpus.h"
>>> >> >>>  #include "hw/sysbus.h"
>>> >> >>>  #include "hw/kvm/clock.h"
>>> >> >>>
>>> >> >>> @@ -66,7 +65,6 @@ static uint64_t 
>>> >> >>> kvmclock_current_nsec(KVMClockState *s)
>>> >> >>>
>>> >> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, 
>>> >> >>> sizeof(time));
>>> >> >>>
>>> >> >>> -    assert(time.tsc_timestamp <= migration_tsc);
>>> >> >>>      delta = migration_tsc - time.tsc_timestamp;
>>> >> >>>      if (time.tsc_shift < 0) {
>>> >> >>>          delta >>= -time.tsc_shift;
>>> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void 
>>> >> >>> *opaque,
>>> >> >>> int running,
>>> >> >>>          if (s->clock_valid) {
>>> >> >>>              return;
>>> >> >>>          }
>>> >> >>> -
>>> >> >>> -        cpu_synchronize_all_states();
>>> >> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>>> >> >>>          if (ret < 0) {
>>> >> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>>> >> >>> strerror(ret));
>>> >> >>> diff --git a/migration.c b/migration.c
>>> >> >>> index 8d675b3..34f2325 100644
>>> >> >>> --- a/migration.c
>>> >> >>> +++ b/migration.c
>>> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>>> >> >>>                  
>>> >> >>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>>> >> >>>                  old_vm_running = runstate_is_running();
>>> >> >>>
>>> >> >>> +                cpu_synchronize_all_states();
>>> >> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>> >> >>>                  if (ret >= 0) {
>>> >> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
>>> >> >
>>> >> >
>>> >> > It could also be useful to apply the above patch _and_ revert
>>> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>>> >> >
>>> >> > Paolo
>>> >>
>>> >> Yes, it solved the issue for me! (it took much time to check because
>>> >> most of country` debian mirrors went inconsistent by some reason)
>>> >>
>>> >> Also trivial addition:
>>> >>
>>> >> diff --git a/migration.c b/migration.c
>>> >> index 34f2325..65d1c88 100644
>>> >> --- a/migration.c
>>> >> +++ b/migration.c
>>> >> @@ -25,6 +25,7 @@
>>> >>  #include "qemu/thread.h"
>>> >>  #include "qmp-commands.h"
>>> >>  #include "trace.h"
>>> >> +#include "sysemu/cpus.h"
>>> >
>>> > And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
>>> >
>>> > That is, test with a stock qemu.git tree and the patch sent today,
>>> > on this thread, to move cpu_synchronize_all_states ?
>>> >
>>> >
>>>
>>> The main reason for things to work for me is a revert of
>>> 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
>>> patches. I had tested two cases, with Alexander`s patch completely
>>> reverted plus suggestion from Marcelo and only with revert 9b178682
>>> plug same suggestion. The difference is that the until Alexander`
>>> patch is not reverted, live migration is always failing by the timeout
>>> value, and when reverted migration always succeeds in 8-10 seconds.
>>> Appropriate diffs are attached for the reference.
>>
>> Andrey,
>>
>> Can you please apply only the following attached patch to an upstream
>> QEMU git tree (move_synchronize_all_states.patch), plus the necessary
>> header file corrections, and attempt to reproduce?
>>
>> When you reproduce, please provide a backtrace and version of the QEMU
>> git tree, and instructions on how to reproduce:
>>
>> 1) full QEMU command line
>> 2) steps to reproduce
>>
>>
>
> Marcelo, as I can see, this patch resembles second case from my
> previous message exactly (there are diffs from the generic rc). I/O is
> not locking up there but live migration failing and libvirt moves a
> freezed state. I can try to run the same on top of rc2, but it`ll be
> probably the same.


Tested on iscsi pool, though there are no-cache requirement and rbd
with disabled cache may survive one migration but iscsi backend hangs
always. As it was before, just rolling back problematic commit fixes
the problem and adding cpu_synchronize_all_states to migration.c has
no difference at a glance in a VM` behavior. The problem consist at
least two separate ones: the current hang and behavior with the
unreverted patch from agraf - last one causes live migration with
writeback cache to fail, cache=none works well in any variant which
survives first condition. Marcin, would you mind to check the current
state of the problem on your environments in a spare time? It is
probably easier to reproduce on iscsi because of way smaller time
needed to set it up, command line and libvirt config attached
(v2.1.0-rc2 plus iscsi-1.11.0).

Attachment: cli.txt
Description: Text document

Attachment: libvirt-config.xml
Description: Text Data

Attachment: fio.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]