qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel f


From: Lei Li
Subject: Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
Date: Mon, 25 Nov 2013 15:29:36 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0

On 11/22/2013 07:36 PM, Paolo Bonzini wrote:
Il 22/11/2013 12:29, Lei Li ha scritto:
During the page flipping migration, ram page of source guest would
be flipped to the destination, that's why the source guest can not
be resumed. AFAICT, the page flipping migration may fail at the
connection stage (including the exchange of pipe fd) and migration
register stage (say any blocker like unsupported migration device),
Unfortunately, some migration problems (e.g. misconfiguration of the
destination QEMU) cannot be detected until the device data is migrated.
  This happens after RAM migration, so there is indeed a reliability problem.

Hi Paolo,

'Some migration problems cannot be detected until the device data is migrated',
do you mean that the outgoing migration has no idea the failure of incoming
side caused by the misconfiguration of the destination QEMU?

In this case, if the migration would fail just because the misconfiguration
of device state on destination, in the meantime the outgoing migration has
no aware of this failure, I think it should add such handling (like synchronize
of the device state list in incoming side?) to the current migration protocol
as it is kind of missing... It can not just rely on the resume of source
guest for such failure... or maybe it should be handled in management app to
force the configuration right?


Postcopy would fix this (assuming the postcopy phase is reliable) by
migrating device data before any page flipping occurs.

Are you suggesting that page flipping should be coupled with the postcopy
migration for live upgrade of QEMU as your comments in the previous version?


Paolo

but it could be resumed for such situation since the memory has not
been flipped to another content. Once the connection is successfully
setup, it would proceed the transmission of ram page which hardly
fails. And for the failure handling in Libvirt, ZhengSheng has proposed
that restarts the old QEMU instead of resume. I know 'hardly' is not
an good answer to your concern, but it is the cost of the limited
memory IMO.

So if downtime is the key to the user, or if it's *zero toleration of
the restarting of QEMU, page flipping migration might not be a good
choice. From the perspective of management app like Libvirt, as the
'live upgrade' of QEMU will be done through localhost migration, and
there are other migration solutions which have lower downtime, like
the real live migration and the postcopy migration that Paolo mentioned
in the previous version [3]. Why not have more than one choice for it?



--
Lei




reply via email to

[Prev in Thread] Current Thread [Next in Thread]