[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Live migration hangs after migration to remote host
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] Live migration hangs after migration to remote host |
Date: |
Wed, 29 Jul 2015 11:38:44 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
* Eduardo Otubo (address@hidden) wrote:
> On Wed, Jul 29, 2015 at 10=32=59AM +0100, Dr. David Alan Gilbert wrote:
> > * Eduardo Otubo (address@hidden) wrote:
> > > On Wed, Jul 29, 2015 at 09=11=21AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Eduardo Otubo (address@hidden) wrote:
> > > > > On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert
> > > > > wrote:
> > > > > > * Eduardo Otubo (address@hidden) wrote:
> > > > > > > Hello all,
> > > > > > >
> > > > > > > I'm facing a weird behavior on my tests: I am able to live migrate
> > > > > > > between two virtual machines on my localhost, but not to another
> > > > > > > machine, both using tcp.
> > > > > > >
> > > > > > > * I am using the same arguments on the command line;
> > > > > > > * Both virtual machines uses the same qcow2 file visible through
> > > > > > > NFS;
> > > > > > > * Both machines are in the same subnet;
> > > > > > > * Migration is being done from intel to intel;
> > > > > > > * Same version of Qemu (github master - f8787f8723);
> > > > > > >
> > > > > > > Using all above I am able to live migrate on the same host:
> > > > > > > between two
> > > > > > > vms on local host or between two vms in the remote host; but when
> > > > > > > migrating from local to remote, the guest hangs. I still can
> > > > > > > access its
> > > > > > > console via ctrl+alt+2, though, and everything seems to be
> > > > > > > normal. If I
> > > > > > > issue a reboote via console on the remote, the guest gets back to
> > > > > > > normal.
> > > > > > >
> > > > > > > Am I missing something here?
> > > > > >
> > > > > > Just checking, but are you saying that as far as qemu is concerned,
> > > > > > the migration
> > > > > > is happy, it's just the guest that's hung?
> > > > >
> > > > > That's exactly the case. The console (via ctrl+alt+2) is active and
> > > > > responding to all commands normally, but the screen (ctrl+alt+1) is
> > > > > frozen and I can't interact with it at all.
> > > >
> > > > Are you driving this via libvirt or using qemu monitor directly?
> > > > If the latter, can you please get an 'info migrate' from the source
> > > > and an 'info status' from the destination at the end of migrate.
> > >
> > > I'm using qemu command line directly. And I got the problem :) See
> > > below.
> > >
> > > >
> > > > > > Are the host clocks on the two hosts very close (there are lots of
> > > > > > weird corner cases with mismatched clocks) - same time zone?
> > > > >
> > > > > Yep. Both machines are in the same room and have the clock sync'ed.
> > > >
> > > > OK, good.
> > > >
> > > > > >
> > > > > > Are you using cache=none (given that it's NFS shared)
> > > > >
> > > > > I wasn't. But I tried again with cache=none and I got exactly the same
> > > > > thing.
> > > >
> > > > OK, and this pair of machines, have you tried both directions - i.e.
> > > > going a->b and b->a - do both directions fail?
> > > > Is the NFS server one of the two machines? If it is, and you're using
> > > > libvirt,
> > > > make sure that the directory the disks are on is an NFS mount on both
> > > > machines; e.g. don't migrate directly from the NFS export.
> > > >
> > > > > Also, I tried with stable-2.2 branch and got the same behavior. I
> > > > > really
> > > > > think that's very unlikely to have unstable code of such an important
> > > > > feature upstream, or on a stable- branch. Most probable thing is that
> > > > > I have something wrong on my environment.
> > > >
> > > > Yes, the challenge is to find what; and if it's something common
> > > > we should try and find a way of spotting it.
> > > >
> > > > > Anyway, I'll keep tetsing different stable- branches until I find
> > > > > something that works for me. I'll keep the mailing list posted.
> > > >
> > > > Could you share the qemu command line so we can see if we can
> > > > spot anything?
> > >
> > > Got the problem! I tried to simplify my qemu command line to the
> > > smallest possible, excluding things I thought it could cause the issue.
> > > With no further due, this is the argument:
> > >
> > > -cpu 'Opteron_G4'
> > >
> > > Without this argument everything works as it should, console responsive
> > > and guest active :)
> >
> > Can you show cat /proc/cpuinfo off the two hosts?
> > (Only one CPU, but please include the whole entry)
>
> Intel host:
> ssor : 7
> vendor_id : GenuineIntel
> cpu family : 6
> model : 60
> model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
> stepping : 3
> microcode : 0x1c
> cpu MHz : 883.468
> cache size : 8192 KB
> physical id : 0
> siblings : 8
> core id : 3
> cpu cores : 4
> apicid : 7
> initial apicid : 7
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts
> dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2
> smep bmi2 erms invpcid xsaveopt
> bugs :
> bogomips : 6784.87
> clflush size : 64
> cache_alignment : 64
> address sizes : 39 bits physical, 48 bits virtual
> power management:
>
> AMD host:
> processor : 5
> vendor_id : AuthenticAMD
> cpu family : 16
> model : 10
> model name : AMD Phenom(tm) II X6 1075T Processor
> stepping : 0
> microcode : 0x10000bf
> cpu MHz : 800.000
> cache size : 512 KB
> physical id : 0
> siblings : 6
> core id : 5
> cpu cores : 6
> apicid : 5
> initial apicid : 5
> fpu : yes
> fpu_exception : yes
> cpuid level : 6
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
> pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc
> extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic
> cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb
> hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
> bogomips : 6027.25
> TLB size : 1024 4K pages
> clflush size : 64
> cache_alignment : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management: ts ttp tm stc 100mhzsteps hwpstate cpb
OK, very different CPUs. My guess is that one or both of them don't support
some feature of the Opteron_G4. When specifying -cpu it's often best
to use the enforce option.
What happens if you try:
qemu-system-x86_64 -machine pc,accel=kvm -cpu Opteron_G4,enforce=on -nographic
on both hosts?
You need to pick a CPU option that works with that on both of the hosts.
Dave
> > Dave
> >
> > > It says on the documentation[1] that it's possible to migrate between
> > > AMD and Intel, but I think I got a corner case. Apparently I can't
> > > specify the exact CPU model. Is this a known issue? Couldn't find any
> > > reference on bugzilla or launchpad.
> > >
> > > [1] - http://www.linux-kvm.org/page/Migration
> > >
> > > --
> > > Eduardo Otubo
> > > ProfitBricks GmbH
> >
> >
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> >
>
> --
> Eduardo Otubo
> ProfitBricks GmbH
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] Live migration hangs after migration to remote host, Eduardo Otubo, 2015/07/28
- Re: [Qemu-devel] Live migration hangs after migration to remote host, Dr. David Alan Gilbert, 2015/07/28
- Re: [Qemu-devel] Live migration hangs after migration to remote host, Eduardo Otubo, 2015/07/29
- Re: [Qemu-devel] Live migration hangs after migration to remote host, Dr. David Alan Gilbert, 2015/07/29
- Re: [Qemu-devel] Live migration hangs after migration to remote host, Eduardo Otubo, 2015/07/29
- Re: [Qemu-devel] Live migration hangs after migration to remote host, Dr. David Alan Gilbert, 2015/07/29
- Re: [Qemu-devel] Live migration hangs after migration to remote host, Eduardo Otubo, 2015/07/29
- Re: [Qemu-devel] Live migration hangs after migration to remote host,
Dr. David Alan Gilbert <=
- Re: [Qemu-devel] Live migration hangs after migration to remote host, Eduardo Otubo, 2015/07/29
- Re: [Qemu-devel] Live migration hangs after migration to remote host, Dr. David Alan Gilbert, 2015/07/29