qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling


From: Sean Hefty
Subject: RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Date: Thu, 30 May 2024 18:23:03 +0000

> > For rdma programming, the current mainstream implementation is to use
> rdma_cm to establish a connection, and then use verbs to transmit data.
> >
> > rdma_cm and ibverbs create two FDs respectively. The two FDs have
> > different responsibilities. rdma_cm fd is used to notify connection
> > establishment events, and verbs fd is used to notify new CQEs. When
> poll/epoll monitoring is directly performed on the rdma_cm fd, only a pollin
> event can be monitored, which means that an rdma_cm event occurs. When
> the verbs fd is directly polled/epolled, only the pollin event can be 
> listened,
> which indicates that a new CQE is generated.
> >
> > Rsocket is a sub-module attached to the rdma_cm library and provides
> > rdma calls that are completely similar to socket interfaces. However,
> > this library returns only the rdma_cm fd for listening to link setup-related
> events and does not expose the verbs fd (readable and writable events for
> listening to data). Only the rpoll interface provided by the RSocket can be 
> used
> to listen to related events. However, QEMU uses the ppoll interface to listen 
> to
> the rdma_cm fd (gotten by raccept API).
> > And cannot listen to the verbs fd event. Only some hacking methods can be
> used to address this problem.
> >
> > Do you guys have any ideas? Thanks.

The current rsocket code allows calling rpoll() with non-rsocket fd's, so an 
app can use rpoll() directly in place of poll().  It may be easiest to add an 
rppoll() call to rsockets and call that when using RDMA.

In case the easy path isn't feasible:

An extension could allow extracting the actual fd's under an rsocket, in order 
to allow a user to call poll()/ppoll() directly.  But it would be non-trivial.

The 'fd' that represents an rsocket happens to be the fd related to the RDMA 
CM.  That's because an rsocket needs a unique integer value to report as an 
'fd' value which will not conflict with any other fd value that the app may 
have.  I would consider the fd value an implementation detail, rather than 
something which an app should depend upon.  (For example, the 'fd' value 
returned for a datagram rsocket is actually a UDP socket fd).

Once an rsocket is in the connected state, it's possible an extended 
rgetsockopt() or rfcntl() call could return the fd related to the CQ.  But if 
an app tried to call poll() on that fd, the results would not be as expected.  
For example, it's possible for data to be available to receive on the rsocket 
without the CQ fd being signaled.  Calling poll() on the CQ fd in this state 
could leave the app hanging.  This is a natural? result of races in the RDMA CQ 
signaling.  If you look at the rsocket rpoll() implementation, you'll see that 
it checks for data prior to sleeping.

For an app to safely wait in poll/ppoll on the CQ fd, it would need to invoke 
some sort of 'pre-poll' routine, which would perform the same checks done in 
rpoll() prior to blocking.  As a reference to a similar pre-poll routine, see 
the fi_trywait() call from this man page: 

https://ofiwg.github.io/libfabric/v1.21.0/man/fi_poll.3.html

This is for a different library but deals with the same underlying problem.  
Obviously adding an rtrywait() to rsockets is possible but wouldn't align with 
any socket API equivalent.

- Sean

reply via email to

[Prev in Thread] Current Thread [Next in Thread]