qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documen


From: Michael R. Hines
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport
Date: Mon, 18 Mar 2013 16:24:44 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

On 03/18/2013 06:40 AM, Michael S. Tsirkin wrote:
I think there are two things here, API documentation and protocol documentation, protocol documentation still needs some more work. Also if what I understand from this document is correct this breaks memory overcommit on destination which needs to be fixed.

I think something chunk-based on the destination side is required as well. You also can't trust the source to tell you the chunk size it could be malicious and ask for too much. Maybe source gives chunk size hint and destination responds with what it wants to use.

Do we allow ballooning *during* the live migration? Is that necessary?

Would it be sufficient to inform the destination which pages are ballooned
and then only register the ones that the VM actually owns?

Is there any feature and/or version negotiation? How are we going to
handle compatibility when we extend the protocol?
You mean, on top of the protocol versioning that's already
builtin to QEMUFile? inside qemu_savevm_state_begin()?

Should I piggy-back and additional protocol version number
before QEMUFile sends it's version number?

So how does destination know it's ok to send anything to source? I suspect this is wrong. When using CM you must post on RQ before completing the connection negotiation, not after it's done.

This is already handled by the RDMA connection manager (librdmacm).

The library already has functions like listen() and accept() the same
way that TCP does.

Once these functions return success, we have a gaurantee that both
sides of the connection have already posted the appropriate work
requests sufficient for driving the migration.


+2. We transmit an empty SEND to let the sender know that
+   we are *ready* to receive some bytes from QEMUFileRDMA.
+   These bytes will come in the form of a another SEND.
Using an empty message seems somewhat hacky, a fixed header in the
message would let you do more things if protocol is ever extended.

Great idea....... I'll add a struct RDMAHeader to each send
message in the next RFC which includes a version number.

(Until now, there were *only* QEMUFile bytes, nothing else,
so I didn't have any reason for a formal structure.)


OK to summarize flow control: at any time there's either 0 or 1 outstanding buffers in RQ. At each time only one side can talk. Destination always goes first, then source, etc. At each time a single send message can be passed. Just FYI, this means you are often at 0 buffers in RQ and IIRC 0 buffers is a worst-case path for infiniband. It's better to keep at least 1 buffers in RQ at all times, so prepost 2 initially so it would fluctuate between 1 and 2.

That's correct. Having 0 buffers is not possible - sending
a message with 0 buffers would throw an error. The "protocol"
as I described ensures that there is always one buffer posted
before waiting for another message to arrive.

I avoided "better" flow control because the non-live state
is so small in comparison to the pc.ram contents that would be sent.
The non-live state is in the range of kilobytes, so it seemed silly to
have more rigorous flow control....

+Migration of pc.ram:
+===============================
+
+At the beginning of the migration, (migration-rdma.c),
+the sender and the receiver populate the list of RAMBlocks
+to be registered with each other into a structure.
Could you add the packet format here as well please?
Need to document endian-ness etc.

There is no packet format for pc.ram. It's just bytes - raw RDMA
writes of each 4K page, because the memory must be registered
before the RDMA write can begin.

(As discussed, there will be a format for SEND, though - so I'll
take care of that in my next RFC).

Yes but we also need to report errors detected during migration. Need to document how this is done. We also need to report success.
Acknowledged - I'll add more verbosity to the different error conditions.

- Michael R. Hines




reply via email to

[Prev in Thread] Current Thread [Next in Thread]