qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 00/12] rdma: migration support


From: Michael R. Hines
Subject: Re: [Qemu-devel] [PATCH v7 00/12] rdma: migration support
Date: Thu, 13 Jun 2013 17:17:17 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5

On 06/13/2013 04:06 PM, Paolo Bonzini wrote:

(CC-ing qemu-devel).

OK, that's good to know. This means that we need to bringup the mlock()
problem as a "larger" issue in the linux community instead of the QEMU
community.

In the meantime, how about I make update to the RDMA patch which does
the following:

1. Solution #1:
        If user requests "x-rdma-pin-all", then
             If QEMU has enabled "-realtime mlock=on"
                    Then, allow the capability
             Else
                   Disallow the capability

2. Solution #2:  Create NEW qemu monitor command which locks memory *in
advance*
                           before the migrate command occurs, to clearly
indicate to the user
                           that the cost of locking memory must be paid
before the migration starts.

Which solution do you prefer? Or do you have alternative idea?
Let's just document it in the release notes.  There's time to fix it.

Regarding the timestamp problem, it should be fixed in the RDMA code.
You did find a bug, but xyz_start_outgoing_migration should be
asynchronous and the pinning should happen in the setup phase.  This is
because the setup phase is already running outside the big QEMU lock and
the guest would not be frozen.

I think you misunderstood the symptom. The pinning is *already*
happening in the setup phase (xyz_start_outgoing_migration), not
inside the the migration_thread().

The problem is in Linux: The guest appears to be frozen not because
of any locks but because the pinning itself (allocating and clearing memory)
is slowing down the virtual machine so much that it looks like its not running.

I think the patches are ready for merging, because incremental work
makes it easier to discuss the changes(*) but you really need to do two
things before 1.6, or I would rather revert them.

Yes, could someone go ahead and pull them? They are very well bug-tested.


(1) move the pinning to the setup phase
This is already done in the existing patchset.

(2) add a debug mode where every pass unpins all the memory and
restarts.  Speed doesn't matter, this is so that the protocol supports
it from the beginning, and any caching heuristics need to be done on the
source side.  As all debug modes, it will be somewhat prone to bitrot,
but at least there is a reference implementation for anyone who laters
wants to add caching.

I think (2) is very important so that, for example, during fault
tolerance you can reduce a bit the pinned size for smaller workloads,
even without ballooning.

I agree that this is a necessary feature (dynamic source registration), but
it is a lot more complicated than a simple unpin of everything before every pass. As you suggested, I would rather not introduce unused code, but rather wait until
someone in the future has a full-functional, testable, implementation.

Actually - I am already working on a fault-tolerance implementation as we speak and will be posting it soon, so it's likely I will submit a patch to do something like
this at that time.

     (*) for example, why the introduction of acct_update_position?  Is
     it a fix for a bug that always existed, or driven by some other
     changes?

This is important because RDMA writes do not happen sycnrhonously. It is impossible to update the accounting inside of save_live_iterate() because the RDMA operations
are still outstanding.

It is only until they have completed later that we can actually know whether what the
accounting statistics really are.

- Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]