qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling


From: Peter Xu
Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Date: Wed, 29 May 2024 12:33:31 -0400

Lei,

On Wed, May 29, 2024 at 02:43:46AM +0000, Gonglei (Arei) wrote:
> For rdma programming, the current mainstream implementation is to use
> rdma_cm to establish a connection, and then use verbs to transmit data.
> rdma_cm and ibverbs create two FDs respectively. The two FDs have
> different responsibilities. rdma_cm fd is used to notify connection
> establishment events, and verbs fd is used to notify new CQEs. When
> poll/epoll monitoring is directly performed on the rdma_cm fd, only a
> pollin event can be monitored, which means that an rdma_cm event
> occurs. When the verbs fd is directly polled/epolled, only the pollin
> event can be listened, which indicates that a new CQE is generated.
>
> Rsocket is a sub-module attached to the rdma_cm library and provides
> rdma calls that are completely similar to socket interfaces. However,
> this library returns only the rdma_cm fd for listening to link
> setup-related events and does not expose the verbs fd (readable and
> writable events for listening to data). Only the rpoll interface provided
> by the RSocket can be used to listen to related events. However, QEMU
> uses the ppoll interface to listen to the rdma_cm fd (gotten by raccept
> API).  And cannot listen to the verbs fd event. Only some hacking methods
> can be used to address this problem.  Do you guys have any ideas? Thanks.

I saw that you mentioned this elsewhere:

> Right. But the question is QEMU do not use rpoll but gilb's ppoll. :(

So what I'm thinking may not make much sense, as I mentioned I don't think
I know rdma at all.. and my idea also has involvement on coroutine stuff
which I also don't know well. But just in case it shed some light in some
form.

IIUC we do iochannel blockings with this no matter for read/write:

        if (len == QIO_CHANNEL_ERR_BLOCK) {
            if (qemu_in_coroutine()) {
                qio_channel_yield(ioc, G_IO_XXX);
            } else {
                qio_channel_wait(ioc, G_IO_XXX);
            }
            continue;
        }

One thing I'm wondering is whether we can provide a new feature bit for
qiochannel, e.g., QIO_CHANNEL_FEATURE_POLL, so that the iochannel can
define its own poll routine rather than using the default when possible.

I think it may not work if it's in a coroutine, as I guess that'll block
other fds from being waked up.  Hence it should look like this:

        if (len == QIO_CHANNEL_ERR_BLOCK) {
            if (qemu_in_coroutine()) {
                qio_channel_yield(ioc, G_IO_XXX);
            } else if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_POLL)) {
                qio_channel_poll(ioc, G_IO_XXX);
            } else {
                qio_channel_wait(ioc, G_IO_XXX);
            }
            continue;
        }

Maybe we even want to forbid such channel to be used in coroutine already,
as when QIO_CHANNEL_FEATURE_POLL set it may mean that this iochannel simply
won't work with poll() like in rdma's use case.

Then rdma iochannel can implement qio_channel_poll() using rpoll().

There's one other dependent issue here in that I _think_ the migration recv
side is still in a coroutine.. so we may need to move that into a thread
first.  IIRC we don't yet have a major blocker to do that, but I didn't
further check either.  I've put that issue aside just to see whether this
may or may not make sense.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]