qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL v4 11/11] rdma: add documentation


From: Michael R. Hines
Subject: Re: [Qemu-devel] [PULL v4 11/11] rdma: add documentation
Date: Thu, 18 Apr 2013 20:57:30 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

I'm very sorry. I totally missed this email. My apologies.


On 04/18/2013 02:55 AM, Michael S. Tsirkin wrote:
On Wed, Apr 17, 2013 at 07:07:20PM -0400, address@hidden wrote:
From: "Michael R. Hines" <address@hidden>

docs/rdma.txt contains full documentation,
wiki links, github url and contact information.

Signed-off-by: Michael R. Hines <address@hidden>
OK that's better. Need to improve the following areas:
- fix half-sentences such as 'faster' (without saying than what)
- document tradeoffs

Besides the tradeoffs of memory registration and latency and
throughput gains, are there other tradeoffs that you would specifically
like to see commented on that are not already listed?
The documentation has both a "Before running" section as well as
a brief "performance" section.

The documentation is quite long now. A full paper has already been
written which clearly documents the tradeoffs.
Any more than that, we would be RE-writing the linked paper that is already
linked to - which is quite through already from the work in 2010.

- better document how to run

Better? I don't understand.

What more is needed besides the QMP migrate command?

One of the libvirt developers already told me not to include
any libvirt commands in the QEMU documentation.

- add more examples

Examples of what?

The only option that has been left from the review process
is chunk registration, and the documentation has instructions
on how to toggle that option.


---

+BEFORE RUNNING:
+===============
+
+RDMA helps make your migration more deterministic under heavy load because
+of the significantly lower latency and higher throughput
Higher and lower than what? Above is not helpful and subtly wrong. Say instead

'On infiniband networks, RDMA can achieve lower latency and higher
throughput than IP over infiniband based networking by reducing the
amount of interrupts and data copies and bypassing the host networking
stack. Using RDMA for VM migration makes migration more deterministic
under heavy VM load'.

And add an example what 'more deterministic' means.


Acknowledged.


provided by infiniband.
Does this works on top of other RDMA transports or just infiniband?
Needs clarification.

Acknowledged. I will include RoCE in the description.

+
+Use of RDMA during migration requires pinning and registering memory
+with the hardware. This means that memory must be resident in memory
+before the hardware can transmit that memory to another machine.
Above is too vague to be of real use. Please insert here the
implications on host versus total VMs memory size.
Also add some examples.

I included an simple 8GB VM example already. Can you be more specific?

+If this is not acceptable for your application or product,
+then the use of RDMA migration is strongly discouraged and you
+should revert back to standard TCP-based migration.
Above is not helpful and will just lead to more questions.
Remove.

Why is it not helpful?

It is a clear warning that RDMA can be
harmful to other software running on the hypervisor if the relocation
is not planned for in advance by management software.

+
+Experimental: Next, decide if you want dynamic page registration.
+For example, if you have an 8GB RAM virtual machine, but only 1GB
+is in active use,
This is wrong, isn't it? You only skip zero pages, so any page
that has data, even if it's not in active use, will be pinned.

Active use != dirty. That's why I chose the word "used".

Used includes both accessed and dirty pages.

A page can be used and later transition to dirty.

To be used means that a page *must* have been first
accessed at some point in time at least once before
it became mapped by the operating system.

I don't think it's our job to get into the "finer points"
of kernel memory management in a higher-level set
of documentation like QEMU.


then disabling this feature will cause all 8GB to
+be pinned and resident in memory.
Add as opposed to the default behaviour which is ....

With all due respect, aren't we micro-managing here?
That was clearly described at the beginning of the documentation.

This feature mostly affects the
+bulk-phase round of the migration and can be disabled for extremely
+high-performance RDMA hardware
Above is meaningless, it does not help user to know whether her hardware
is "extremely high-performance". Put numbers here please.
Does it help 40G cards but not 20g ones? By how much?

Acknowledged.


using the following command:
+
+QEMU Monitor Command:
+$ migrate_set_capability x-chunk-register-destination off # enabled by default
+
+Performing this action will cause all 8GB to be pinned, so if that's
+not what you want, then please ignore this step altogether.
+
+On the other hand, this will also significantly speed up the bulk round
+of the migration, which can greatly reduce the "total" time of your migration.

Please add some example numbers so people know what the tradeoff is.

Acknowledged.

+
+RUNNING:
+========
+
+First, set the migration speed to match your hardware's capabilities:
+
+QEMU Monitor Command:
+$ migrate_set_speed 40g # or whatever is the MAX of your RDMA device
+
+Next, on the destination machine, add the following to the QEMU command line:
+
+qemu ..... -incoming x-rdma:host:port
+
+Finally, perform the actual migration:
+
+QEMU Monitor Command:
+$ migrate -d x-rdma:host:port
+
Note users stop reading here, below is info for developers.
So please add here the requirement to do ulimit and with what value.
Also add an example with VM size.

Exactly what is the right ulimit value? Only the administrator
can determine that value based on how much free memory
is available on the hypervisor. If there is plenty of memory,
then no ulimit command is required at all, because QEMU can
safely pin the entire VM. If there is not enough free memory
then, ulimit must be limited to the amount of available free memory.

Should I just make a general statement like that?


+TODO:
+=====
+1. 'migrate x-rdma:host:port' and '-incoming x-rdma' options will be
+   renamed to 'rdma' after the experimental phase of this work has
+   completed upstream.
+2. Currently, 'ulimit -l' mlock() limits as well as cgroups swap limits
+   are not compatible with infinband memory pinning and will result in
+   an aborted migration (but with the source VM left unaffected).
+3. Use of the recent /proc/<pid>/pagemap would likely speed up
+   the use of KSM and ballooning while using RDMA.
For KSM you'll need the _GIFT patch for this I think, maybe note this.

I would prefer not to document features that do not yet exist.

In the near future, I will probably use the pagemap, in which
case I can update the documentation at that time.

- Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]