qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v2: 5/6] connection-setup code


From: Orit Wasserman
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v2: 5/6] connection-setup code between client/server
Date: Mon, 18 Feb 2013 10:24:49 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 02/14/2013 09:29 PM, Michael R. Hines wrote:
> Orit (and anthony if you're not busy),
> 
> I forgot to respond to this very important comment:
> 
> On 02/13/2013 03:46 AM, Orit Wasserman wrote:
>> Are you still using the tcp for transferring device state? If so you can 
>> call the tcp functions from the migration rdma code as a first step but I 
>> would prefer it to use RDMA too.
> 
> This is the crux of the problem of using RDMA for migration: Currently all of 
> the QEMU migration control logic and device state goes through the the 
> QEMUFile implementation. RDMA, however is by nature a zero-copy solution and 
> is incompatible with QEMUFile.
> 
> Using RDMA for transferring device state is not recommended: Setuping an RDMA 
> requires registering the memory locations on both sides with the RDMA 
> hardware. This is not ideal because this would require pinning the memory 
> holding the device state and then issuing the RDMA transfer for *each* type 
> of device - which would require changing the control path of every type of 
> migrated device in QEMU.
> 
Hi Michael,

The guest device state is quite small (~100K probably less) especially when 
compared to
 the guest memory and we already are pinning the guest memory for RDMA any way
I was actually wondering about the memory pinning, 
do we pin all guest memory pages as migration starts or on demand? 

> Currently the Patch we submitted bypasses QEMUFile. It does just issues the 
> RDMA transfer for the memory that was dirtied and then continues along with 
> the rest of the migration call path normally.
> 
> In an ideal world, we would prefer a hyrbid approach, something like:
> 
> *Begin Migration Iteration Round:*
> 1. stop VCPU

The vcpus are only stopped in the last phase iteration is done while the guest 
is running ...

> 2. start iterative pass over memory
> 3. send control signals (if any) / device state to QEMUFile

device state is only sent in the last phase

> 4. When a dirty memory page is found, do:
>      a) Instruct the QEMUFile to block

If there is nothing to send there is no need to block ...

>      b) Issue the RDMA transfer
>      c) Instruct the QEMUFile to unblock
> 5. resume VCPU
no need.
> 
> This would require a "smarter" QEMUFile implementation that understands when 
> to block and for how long.
> 

For the guest memory pages, sending the pages directly without QemuFile (that 
does buffering) is better,
I would suggest implementing an QemuRDMAFile for this.
It will have a new API for the memory pages (zero copy) so instead of using 
qemu_put_buffer 
we will call qemu_rdma_buffer or it can reimplement qemu_put_buffer (you need 
to add offset).

As for device state which is sent in the last phase and is small you can modify 
the current implementation.
(well Paolo sent patches that are changing this but I think buffering is still 
an option) 
The current migration implementation copies the device state into a buffer
and than send the data from the buffer (QemuBufferedFile).
You only need to pin this buffer, and RDMA it after all the device state was 
written into it.

Regards,
Orit
> Comments?
> 
> - Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]