Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protoc

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protoc

From:	Michael S. Tsirkin
Subject:	Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Date:	Thu, 11 Apr 2013 18:44:24 +0300

On Thu, Apr 11, 2013 at 11:18:56AM -0400, Michael R. Hines wrote:
> First of all,

I know it's a hard habit to break but could you
please stop stop top-posting?

> this whole argument should not even exist for the
> following reason:
> 
> Page registrations are supposed to be *rare* - once a page is registered, it
> is registered for life. There is nothing in the design that says a page must
> be "unregistered" and I do not believe anybody is proposing that.

Hmm proposing what? Of course you need to unregister pages
eventually otherwise your pinned memory on the destination
will just grow indefinitely. People are often doing
registration caches to help reduce the overhead,
but never unregistering seems too aggressive.

You mean the chunk-based thing just delays the agony
until all guest memory is pinned for RDMA anyway?
Wait, is it registered for life on the source too?

Well this kind of explains why qemu was dying on OOM,
doesn't it?

> Second, this means that my previous analysis showing that
> performance was reduced
> was also incorrect because most of the RDMA transfers were against
> pages during
> the bulk phase round, which incorrectly makes dynamic page
> registration look bad.
> I should have done more testing *after* the bulk phase round,
> and I apologize for not doing that.
> 
> Indeed when I do such a test (with the 'stress' command) the cost of
> page registration disappears
> because most of the registrations have already completed a long time ago.
> 
> Thanks, Paolo for reminding us about the bulk-phase behavior to being with.
> 
> Third, this means that optimizing this protocol would not be helpful
> and that we should
> follow the "keep it simple" approach because during steady-state
> phase of the migration
> most of the pages should have already been registered.
> 
> - Michael

If you mean that registering all memory is a requirement,
then I am not sure I agree: you wrote one slow protocol, this
does not mean that there can't be a fast one.

But if you mean to say that the current chunk based code
is useless, then I'd have to agree.

> 
> On 04/11/2013 10:37 AM, Michael S. Tsirkin wrote:
> >Answer above.
> >
> >Here's how things are supposed to work in a pipeline:
> >
> >req -> registration request
> >res -> response
> >done -> rdma done notification (remote can unregister)
> >pgX  -> page, or chunk, or whatever unit is used
> >         for registration
> >rdma -> one or more rdma write requests
> >
> >
> >
> >pg1 ->  pin -> req -> res -> rdma -> done
> >         pg2 ->  pin -> req -> res -> rdma -> done
> >                 pg3 -> pin -> req -> res -> rdma -> done
> >                        pg4 -> pin -> req -> res -> rdma -> done
> >                               pg4 -> pin -> req -> res -> rdma -> done
> >
> >
> >
> >It's like a assembly line see?  So while software does the registration
> >roundtrip dance, hardware is processing rdma requests for previous
> >chunks.
> >
> >....
> >
> >When do you have to stall? when you run out of rx buffer credits so you
> >can not start a new req.  Your protocol has 2 outstanding buffers,
> >so you can only have one req in the air. Do more and
> >you will not need to stall - possibly at all.
> >
> >One other minor point is that your protocol requires extra explicit
> >ready commands. You can pass the number of rx buffers as extra payload
> >in the traffic you are sending anyway, and reduce that overhead.
> >

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation, (continued)
- [Qemu-devel] [RFC PATCH RDMA support v5: 02/12] check for CONFIG_RDMA, mrhines, 2013/04/08
  - Re: [Qemu-devel] [RFC PATCH RDMA support v5: 02/12] check for CONFIG_RDMA, Paolo Bonzini, 2013/04/09
- [Qemu-devel] [RFC PATCH RDMA support v5: 06/12] connection-establishment for RDMA, mrhines, 2013/04/08
- [Qemu-devel] [RFC PATCH RDMA support v5: 08/12] new capabilities added and check for QMP string 'rdma', mrhines, 2013/04/08
  - Re: [Qemu-devel] [RFC PATCH RDMA support v5: 08/12] new capabilities added and check for QMP string 'rdma', Paolo Bonzini, 2013/04/09
    - Re: [Qemu-devel] [RFC PATCH RDMA support v5: 08/12] new capabilities added and check for QMP string 'rdma', Michael R. Hines, 2013/04/09

Prev by Date: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Next by Date: Re: [Qemu-devel] [RFC PATCH RDMA support v1: 10/13] introduce new command migrate_check_for_zero
Previous by thread: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Next by thread: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Index(es):
- Date
- Thread