qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel f


From: Lei Li
Subject: Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
Date: Fri, 22 Nov 2013 19:29:05 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0

On 11/21/2013 06:19 PM, Daniel P. Berrange wrote:
On Thu, Nov 21, 2013 at 05:11:23PM +0800, Lei Li wrote:
This patch series tries to introduce a mechanism using side
channel pipe for RAM via SCM_RIGHTS with unix domain socket
protocol migration.

This side channel is used for the page flipping by vmsplice,
which is the internal mechanism for localhost migration that
we are trying to add to QEMU. The backgroud info and previous
patch series for reference,

Localhost migration
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg02916.html

migration: Introduce side channel for RAM
http://lists.gnu.org/archive/html/qemu-devel/2013-09/msg04043.html

I have picked patches from the localhost migration series and rebased
it on the series of side channel, now it is a complete series that
passed the basic test.

Please let me know if there is anything needs to be fixed or improved.
Your suggestions and comments are very welcome, and thanks to Paolo
for his continued review and useful suggestions.
In discussions about supporting this for libvirt, we were told that
when this localhost migration fails, you cannot re-start the guest
on the original source QEMU.

If this is true, this implementation is not satisfactory IMHO. One
of the main motivations of this feature is to allow for in-place
live upgrades of QEMU binaries, for people who can't tolerate the
downtime of restarting their guests, and whom don't have a spare
host to migrate them to.

If people are using this because they can't tolerate any downtime
of the guest, then we need to be able to fully deal with failure to
complete migration by switching back to the original QEMU process,
as we can do with normal non-localhost migration.

Hi Daniel,

Page flipping is introduced here not primarily for low downtime, but
more to avoid requiring that there is enough free memory to fit an
additional copy of the largest guest which is the requirement today
with current localhost migration as the additional explanation from
Anthony in first proposal version [1].

Of course low downtime is also important to the page flipping
migration as the use case of it is to allow 'live' upgrade of a
running QEMU instance, so we expect page flipping through vmsplice
is fast enough to meet it. As an initial implementation of this
feature right now, the downtime is not good, but we are working on
it as there has been some work on kernel side [2].

During the page flipping migration, ram page of source guest would
be flipped to the destination, that's why the source guest can not
be resumed. AFAICT, the page flipping migration may fail at the
connection stage (including the exchange of pipe fd) and migration
register stage (say any blocker like unsupported migration device),
but it could be resumed for such situation since the memory has not
been flipped to another content. Once the connection is successfully
setup, it would proceed the transmission of ram page which hardly
fails. And for the failure handling in Libvirt, ZhengSheng has proposed
that restarts the old QEMU instead of resume. I know 'hardly' is not
an good answer to your concern, but it is the cost of the limited
memory IMO.

So if downtime is the key to the user, or if it's *zero toleration of
the restarting of QEMU, page flipping migration might not be a good
choice. From the perspective of management app like Libvirt, as the
'live upgrade' of QEMU will be done through localhost migration, and
there are other migration solutions which have lower downtime, like
the real live migration and the postcopy migration that Paolo mentioned
in the previous version [3]. Why not have more than one choice for it?


[1]http://lists.gnu.org/archive/html/qemu-devel/2013-06/msg02577.html
[2]http://article.gmane.org/gmane.linux.kernel/1574277
[3]http://lists.gnu.org/archive/html/qemu-devel/2013-10/msg03212.html

Regards,
Daniel


--
Lei




reply via email to

[Prev in Thread] Current Thread [Next in Thread]