[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH v4 07/14] vfio-user: run vfio-user context
|
From: |
Thanos Makatos |
|
Subject: |
RE: [PATCH v4 07/14] vfio-user: run vfio-user context |
|
Date: |
Wed, 5 Jan 2022 10:38:10 +0000 |
> -----Original Message-----
> From: Jag Raman <jag.raman@oracle.com>
> Sent: 17 December 2021 18:00
> To: Stefan Hajnoczi <stefanha@redhat.com>; John Levon
> <john.levon@nutanix.com>; Thanos Makatos <thanos.makatos@nutanix.com>
> Cc: qemu-devel <qemu-devel@nongnu.org>; Alex Williamson
> <alex.williamson@redhat.com>; Marc-André Lureau
> <marcandre.lureau@gmail.com>; Philippe Mathieu-Daudé
> <philmd@redhat.com>; pbonzini@redhat.com; alex.bennee@linaro.org;
> thuth@redhat.com; crosa@redhat.com; wainersm@redhat.com;
> bleal@redhat.com; Elena Ufimtseva <elena.ufimtseva@oracle.com>; John
> Levon <john.levon@nutanix.com>; John Johnson
> <john.g.johnson@oracle.com>; Thanos Makatos
> <thanos.makatos@nutanix.com>; Swapnil Ingle <swapnil.ingle@nutanix.com>
> Subject: Re: [PATCH v4 07/14] vfio-user: run vfio-user context
>
>
>
> > On Dec 16, 2021, at 6:17 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Wed, Dec 15, 2021 at 10:35:31AM -0500, Jagannathan Raman wrote:
> >> @@ -114,6 +118,62 @@ static void vfu_object_set_device(Object *obj,
> const char *str, Error **errp)
> >> vfu_object_init_ctx(o, errp);
> >> }
> >>
> >> +static void vfu_object_ctx_run(void *opaque)
> >> +{
> >> + VfuObject *o = opaque;
> >> + int ret = -1;
> >> +
> >> + while (ret != 0) {
> >> + ret = vfu_run_ctx(o->vfu_ctx);
> >> + if (ret < 0) {
> >> + if (errno == EINTR) {
> >> + continue;
> >> + } else if (errno == ENOTCONN) {
> >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> >> + o->vfu_poll_fd = -1;
> >> + object_unparent(OBJECT(o));
> >> + break;
> >
> > If nothing else logs a message then I think that should be done here so
> > users know why their vfio-user server object disappeared.
>
> Sure will do.
>
> Do you prefer a trace, or a message to the console? Trace makes sense to me.
> Presently, the client could unplug the vfio-user device which would trigger
> the
> deletion of this object. This process could happen quietly.
>
> >
> >> + } else {
> >> + error_setg(&error_abort, "vfu: Failed to run device %s -
> >> %s",
> >> + o->device, strerror(errno));
> >
> > error_abort is equivalent to assuming !o->daemon. In the case where the
> > user doesn't want to automatically shut down the process we need to log
> > a message without aborting.
>
> OK, makes sense.
>
> >
> >> + break;
> >
> > Indentation is off.
> >
> >> + }
> >> + }
> >> + }
> >> +}
> >> +
> >> +static void vfu_object_attach_ctx(void *opaque)
> >> +{
> >> + VfuObject *o = opaque;
> >> + GPollFD pfds[1];
> >> + int ret;
> >> +
> >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> >> +
> >> + pfds[0].fd = o->vfu_poll_fd;
> >> + pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> >> +
> >> +retry_attach:
> >> + ret = vfu_attach_ctx(o->vfu_ctx);
> >> + if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
> >> + qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
> >> + goto retry_attach;
> >
> > This can block the thread indefinitely. Other events like monitor
> > commands are not handled in this loop. Please make this asynchronous
> > (set an fd handler and return from this function so we can try again
> > later).
> >
> > The vfu_attach_ctx() implementation synchronously negotiates the
> > vfio-user connection :(. That's a shame because even if accept(2) is
> > handled asynchronously, the negotiation can still block. It would be
> > cleanest to have a fully async libvfio-user's vfu_attach_ctx() API to
> > avoid blocking. Is that possible?
>
> Thanos / John,
>
> Any thoughts on this?
I'm discussing this with John and FYI there are other places where libvfio-user
can block, e.g. sending a response or receiving a command. Is it just the
negotiation you want it to be asynchronous or _all_ libvfio-user operations?
Making libvfio-user fully asynchronous might require a substantial API rewrite.
>
> >
> > If libvfio-user can't make vfu_attach_ctx() fully async then it may be
> > possible to spawn a thread just for the blocking vfu_attach_ctx() call
> > and report the result back to the event loop (e.g. using EventNotifier).
> > That adds a bunch of code to work around a blocking API though, so I
> > guess we can leave the blocking part if necessary.
> >
> > At the very minimum, please make EAGAIN/EWOULDBLOCK async as
> mentioned
> > above and add a comment explaining the situation with the
> > partially-async vfu_attach_ctx() API so it's clear that this can block
> > (that way it's clear that you're aware of the issue and this isn't by
> > accident).
>
> Sure, we could make the attach async at QEMU depending on how the
> library prefers to do this.
>
> >
> >> + } else if (ret < 0) {
> >> + error_setg(&error_abort,
> >> + "vfu: Failed to attach device %s to context - %s",
> >> + o->device, strerror(errno));
> >
> > error_abort assumes !o->daemon. Please handle the o->daemon == true
> > case by logging an error without aborting.
> >
> >> + return;
> >> + }
> >> +
> >> + o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
> >> + if (o->vfu_poll_fd < 0) {
> >> + error_setg(&error_abort, "vfu: Failed to get poll fd %s",
> >> o->device);
> >
> > Same here.
> >
> >> @@ -208,6 +284,8 @@ static void vfu_object_init(Object *obj)
> >> TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
> >> return;
> >> }
> >> +
> >> + o->vfu_poll_fd = -1;
> >> }
> >
> > This must call qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL)
> > when o->vfu_poll_fd != -1 to avoid leaving a dangling fd handler
> > callback registered.
>
> This is during the init phase, and the FD handlers are not set. Do you mean
> to add this at finalize?
>
> I agree it’s good to explicitly add this at finalize. But vfu_destroy_ctx()
> should
> trigger a ENOTCONN, which would do it anyway.
>
> Thank you!
> --
> Jag
- RE: [PATCH v4 07/14] vfio-user: run vfio-user context,
Thanos Makatos <=