[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] When does live migration give up?
From: |
Alex Bligh |
Subject: |
Re: [Qemu-devel] When does live migration give up? |
Date: |
Wed, 4 Sep 2013 23:37:42 +0100 |
Paolo,
>
> Do you mean something like this?
>
> destination
> socket()
> bind() to { sin_port = 0, sin_addr.s_addr = INADDR_ANY }
> listen()
> getsockname()
> send address to source
> accept()
> start QEMU with file descriptor returned by accept
>
> source
> read address
> socket()
> connect()
> pass socket file descriptor to QEMU and migrate to it
>
> Anything that doesn't use sin_port = 0 and getsockname() is prone to
> race conditions.
From memory we bind() to a specific randomly chosen port and if
that fails retry until bind() succeeds. This is because we
want the port to be within a given range. I believe that is
race free as only one bind() can run at once.
>> Approx 10% of migrations die after many minutes on the
>> customer's platform. This does not appear to happen if migrations are
>> not carried out 50 at a time.
>
> Dying after many minutes usually means that the destination is not set
> up the same as the source, as you said below.
Hmmm. OK I thought that produced an immediate error. Is there any way
of logging what's up to stderr or similar etc?
Alex
>
> Paolo
>
>> We appear to be getting something other than 'ms' returned through the
>> monitoring system. Unhelpfully what that is is not logged.
>>
>> Is there anything (apart from the socket closing prematurely) which can
>> cause a failed migration after many minutes? We've seen problems where
>> the destination is not set up the same as the source (e.g. different
>> numbers of NICs) but IIRC that fails much earlier.
>>
>> To make things easier (cough), this is qemu 1.0 (as shipped with Ubuntu
>> Precise).
>>
>
>
>
--
Alex Bligh