qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Migration To-do list


From: Hudzia, Benoit
Subject: Re: [Qemu-devel] Migration To-do list
Date: Wed, 14 Nov 2012 10:07:15 +0000

Inline 

> -----Original Message-----
> From: Isaku Yamahata [mailto:address@hidden
> Sent: 14 November 2012 02:23
> To: Hudzia, Benoit
> Cc: address@hidden; qemu-devel qemu-devel; Orit Wasserman;
> address@hidden; Michael Roth
> Subject: Re: Migration To-do list
> 
> On Tue, Nov 13, 2012 at 05:46:13PM +0000, Hudzia, Benoit wrote:
> > Hi,
> >
> > One concept we have been playing around in the context of  and hybrid
> and post copy and might make sense if you are orienting your effort toward
> RDMA / Post copy is to move most of the logic in the destination side.
> >
> > This is one thing you might want to consider as it  can solve some of the
> issue you currently have and allow you to maintain almost a single API /
> Protocol once integrating with post copy approach.
> >
> > The idea is to drive the migration from the destination side. I.e. The page
> are pulled from the destination and not pushed from the source side.
> >
> > Ex: current pre-copy :
> >
> >     *extract dirty bitmap ( dirty bitmap extraction can be scheduled or
> triggered by destination)
> >     * send it to the destination side
> >     * have the destination iterating over the bitmap ( can do page
> prioritization here)
> 
> IIRC last year, you mentioned page prioritization, but didn't this year.
> Is it still supported?
> Where is it implemented? in qemu or kernel?

It is in Qemu, it is too expensive and specialised to do that within the 
kernel.  
I think Orit did some work regarding this aspect however I am not 100% sure it 
is the stable branch yet. 

> 
> 
> >     * depending of protocol :
> >             _  with standard socket ( or RDS) :
> >                     . Destination : request page(s)<- can be batched
> >                     .  source receive request send back the page
> >                     . destination process
> >             _ with RDMA :
> >                     . Destination Read Page from source to local page (
> the page have been mapped to RDMA at the bitmap extraction) ( RDMA
> support scatter gather)
> 
> Although I'm not familiar with RDMA, RDMA requires the exchange of DMA-
> address between
> sender and receiver in advance and pinning down pages.
> It it correct?

Yes it is correct. This is why you would be registering the memory only when 
the page is dirtied. Avoiding large memory pinning for too long. ( an unpinning 
upon RDMA read confirmation ).
The address is the same one as the one within the virtual memory. What you 
exchange is a combination of RDMA key ( to uniquely identify the memory region 
you are sharing ) and the offset start address of the MR.  Then you can read 
write at will within it. That is why it's a little bit tricky because the RDMA 
write and read typically do not trigger any notification ( cpu / os etc..  
everything is bypassed) as a result your page content can change without the 
process/OS knowing it.    

> 
> 
> >             _ with post copy
> >                     . pretty much the same but the dirty bitmap reset is
> done in kernel during the post copy operation ( provide a better dirty bit
> tracking granularity)
> >
> >
> > Disadvantage:
> >     * add a round trip that can be compensate with batch operation (
> only with standard socket)
> >
> > Advantage :
> >     * most of the heavy lifting is done at the destination side leaving the
> source to respond to request in an event based format
> >     * resolve a lot of issue you have with your threading form the sender
> side ( accounting etc.. )
> >     * extremely friendly to optimised solution
> >     * if the bitmap generation is expensive we can overlap their
> generation creating a semi continuous delivery of them guaranteeing an
> uninterrupted and optimised  flow. => we decouple the bitmap generation
> from the send/ receive operation.
> >
> >
> >
> > Anyway , I will notify you as soon as I have the patch / library available 
> > for
> RDMA / postcopy.
> >
> > Note On the fault tolerance part: this require a lot more heavy code
> optimisation and poking around to guarantee efficient checkpointing. Most
> of the solution we tested so far ( Remus and an old version of kemari) scale
> poorly . Again, an RDMA / post copy solution is kind of necessary when you
> talk about check pointing enterprise class applications.
> 
> IIRC Kemari guys evaluated IB case. I'm not sure that it was with RDMA or
> IPoIB.
> 
> thanks,
> >
> >
> > Regards
> > Benoit
> >
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: Juan Quintela [mailto:address@hidden
> > > Sent: 13 November 2012 16:19
> > > To: qemu-devel qemu-devel; Orit Wasserman; address@hidden;
> > > Hudzia, Benoit; Isaku Yamahata; Michael Roth
> > > Subject: Migration ToDo list
> > >
> > >
> > > Hi
> > >
> > > If you have anything else to put, please add.
> > >
> > > Migration Thread
> > > * Plan is integrate it as one of first thing in December (me)
> > > * Remove copies with buffered file (me)
> > >
> > > Bitmap Optimization
> > > * Finish moving to individual bitmaps for migration/vga/code
> > > * Make sure we don't copy things around
> > > * Shared memory bitmap with kvm?
> > > * Move to 2MB pages bitmap and then fine grain?
> > >
> > > QIDL
> > > * Review the patches (me)
> > >
> > > PostCopy
> > > * Review patches?
> > > * See what we can already integrate?
> > >   I remember for last year that we could integrate the 1st third or so
> > >
> > > RDMA
> > > * Send RDMA/tcp/.... library they already have (Benoit)
> > > * This is required for postcopy
> > > * This can be used for precopy
> > >
> > > General
> > > * Change protocol to:
> > >   a) being always 16byte aligned (paolo said that is faster)
> > >   b) do scatter/gather of the pages?
> > >
> > > Fault Tolerance
> > > * That is built on top of migration code, but I have nothing to add.
> > >
> > > Any more ideas?
> > >
> > > Later, Juan.
> >
> 
> --
> yamahata



reply via email to

[Prev in Thread] Current Thread [Next in Thread]