qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatib


From: Eugenio Perez Martin
Subject: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility
Date: Thu, 8 Feb 2024 18:20:23 +0100

On Wed, Feb 7, 2024 at 11:18 AM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 06.02.2024 um 17:44 hat Eugenio Perez Martin geschrieben:
> > On Mon, Feb 5, 2024 at 2:49 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > >
> > > Am 05.02.2024 um 13:22 hat Eugenio Perez Martin geschrieben:
> > > > On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > > > >
> > > > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > > > status flag is set; with the current API of the kernel module, it is
> > > > > impossible to enable the opposite order in our block export code 
> > > > > because
> > > > > userspace is not notified when a virtqueue is enabled.
> > > > >
> > > > > This requirement also mathces the normal initialisation order as done 
> > > > > by
> > > > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > > > this.
> > > > >
> > > > > This changes vdpa-dev to use the normal order again and use the 
> > > > > standard
> > > > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > > > used with vdpa-dev again after this fix.
> > > > >
> > > > > Cc: qemu-stable@nongnu.org
> > > > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > > > ---
> > > > >  hw/virtio/vdpa-dev.c   |  5 +----
> > > > >  hw/virtio/vhost-vdpa.c | 17 +++++++++++++++++
> > > > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > > > > index eb9ecea83b..13e87f06f6 100644
> > > > > --- a/hw/virtio/vdpa-dev.c
> > > > > +++ b/hw/virtio/vdpa-dev.c
> > > > > @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice 
> > > > > *vdev, Error **errp)
> > > > >
> > > > >      s->dev.acked_features = vdev->guest_features;
> > > > >
> > > > > -    ret = vhost_dev_start(&s->dev, vdev, false);
> > > > > +    ret = vhost_dev_start(&s->dev, vdev, true);
> > > > >      if (ret < 0) {
> > > > >          error_setg_errno(errp, -ret, "Error starting vhost");
> > > > >          goto err_guest_notifiers;
> > > > >      }
> > > > > -    for (i = 0; i < s->dev.nvqs; ++i) {
> > > > > -        vhost_vdpa_set_vring_ready(&s->vdpa, i);
> > > > > -    }
> > > > >      s->started = true;
> > > > >
> > > > >      /*
> > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > > index 3a43beb312..c4574d56c5 100644
> > > > > --- a/hw/virtio/vhost-vdpa.c
> > > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > > @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa 
> > > > > *v, unsigned idx)
> > > > >      return r;
> > > > >  }
> > > > >
> > > > > +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int 
> > > > > enable)
> > > > > +{
> > > > > +    struct vhost_vdpa *v = dev->opaque;
> > > > > +    unsigned int i;
> > > > > +    int ret;
> > > > > +
> > > > > +    for (i = 0; i < dev->nvqs; ++i) {
> > > > > +        ret = vhost_vdpa_set_vring_ready(v, i);
> > > > > +        if (ret < 0) {
> > > > > +            return ret;
> > > > > +        }
> > > > > +    }
> > > > > +
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > >  static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> > > > >                                         int fd)
> > > > >  {
> > > > > @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
> > > > >          .vhost_set_features = vhost_vdpa_set_features,
> > > > >          .vhost_reset_device = vhost_vdpa_reset_device,
> > > > >          .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> > > > > +        .vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
> > > > >          .vhost_get_config  = vhost_vdpa_get_config,
> > > > >          .vhost_set_config = vhost_vdpa_set_config,
> > > > >          .vhost_requires_shm_log = NULL,
> > > >
> > > > vhost-vdpa net enables CVQ before dataplane ones to configure all the
> > > > device in the destination of a live migration. To go back again to
> > > > this callback would cause the device to enable the dataplane before
> > > > virtqueues are configured again.
> > >
> > > Not that it makes a difference, but I don't think it would actually be
> > > going back. Even before your commit 6c482547, we were not making use of
> > > the generic callback but just called the function in a slightly
> > > different place (but less different than after commit 6c482547).
> > >
> > > But anyway... Why don't the other vhost backend need the same for
> > > vhost-net then? Do they just not support live migration?
> >
> > They don't support control virtqueue. More specifically, control
> > virtqueue is handled directly in QEMU.
>
> So the network device already has to special case vdpa instead of using
> the same code for all vhost backends? :-/
>
> > > I don't know the code well enough to say where the problem is, but if
> > > vhost-vdpa networking code relies on the usual vhost operations not
> > > being implemented and bypasses VhostOps to replace it, that sounds like
> > > a design problem to me.
> >
> > I don't follow this. What vhost operation is expected not to be implemented?
>
> You were concerned about implementing .vhost_set_vring_enable in
> vdpa_ops like my patch does. So it seems that the networking code
> requires that it is not implemented?
>

Kind of. vdpa_net requires that the enable calls on the dataplane
vrings are done after DRIVER_OK. So as you say in your mail, we're
good with vhost_dev_start, and changes may be focused to vhost_net.c

Vhost vdpa enable / disable semantics is just different from
vhost-user, the one that originally implemented
.vhost_set_vring_enable. That is the reason why the vhost vDPA enable
was called at vhost_vdpa_dev_start originally.

Vhost-user also calls that function to enable and disable queues. And
the vhost kernel does not even implement the vhost_op, but relies on
vhost_net_set_backend (not sure about other kinds of devices).

This patch is moving toward making them equal, which is not correct.
But maybe it is ok to do that tiny step in that direction to fix
generic vhost-vdpa generic dev and fix that later.

Just for completion, I tried to implement it using the vhost_op but it
was rejected already [1]. It was before Stefano's patch for
selectively disable vrings enable.

> On the other hand, for vhost-vdpa, the callback seems to be called in
> exactly the right place where virtqueues need to be enabled, like for
> other vhost devices.
>
> > > Maybe VhostOps needs a new operation to enable
> > > just a single virtqueue that can be used by the networking code instead?
> > >
> > > > How does VDUSE userspace knows how many queues should enable? Can't
> > > > the kernel perform the necessary actions after DRIVER_OK, like
> > > > configuring the kick etc?
> > >
> > > Not sure if I understand the question. The vdpa kernel interface always
> > > enables individual queues, so the VDUSE userspace will enable whatever
> > > queues it was asked to enable. The only restriction is that the queues
> > > need to be enabled before setting DRIVER_OK.
> > >
> > > The interface that enables all virtqueues at once seems to be just
> > > .vhost_set_vring_enable in QEMU.
> >
> > It enables all virtqueues of the same vhost device in QEMU, but QEMU
> > creates one vhost_dev per each vhost-net virtqueue pair and another
> > one for CVQ. This goes back to the time where mq vhost-kernel net
> > devices were implemented by mergin many tap devices. net/vhost-vdpa.c
> > only enables the last one, which corresponds to CVQ, and then enables
> > the rest once all messages have been received.
>
> So it's less about the granularity of .vhost_set_vring_enable, which
> would just be right, but about the order in which it is called for the
> different vhost_devs?
>
> Can we influence the order in another way than just not implementing the
> callback at all? I think net specific weirdness should be contained in
> the net implementation and not affect generic vdpa code.
>
> > On the other hand, .vhost_set_vring_enable is also used for queue
> > reset (vhost_net_virtqueue_reset and vhost_net_virtqueue_restart). In
> > other words, it is called after the set DRIVER_OK. I guess it is fine
> > for VDUSE as long as it does not offer vring reset capabilities, but
> > future wise should we start going in that direction?
>
> I don't actually know VDUSE very well, but that would probably make
> sense.
>
> Though for the moment, I just tried to simply attach a VDUSE device and
> failed, so I'm just trying to fix the regression from your commit.
>

Understandable.

> > But kernels without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK should be
> > supported, right. Maybe we can add vhost_vdpa_set_vring_enable and
> > call vhost_dev_start with vrings parameters conditional to the feature
> > flag? That way the caller can enable it later or just enable all the
> > rings altogether.
>
> Which specific caller do you mean here?
>
> For vdpa-dev, I don't see any problem with just passing vrings=true
> unconditionally. That is clearly the least problematic order even if
> another order may in theory be allowed by the spec.
>
> For vhost_net, I don't know. As far as I am concerned, there is no
> reason to change its call and it could continue to pass vrings=false.
>
> So vhost_dev_start() seems entirely unproblematic.
>

Yes I think the same.

> The only part that changes for net is that vhost_set_vring_enable() now
> does something where it didn't do anything before. If that is a problem
> for vdpa based network code, maybe we can just special case vdpa there
> to not call vhost_ops->vhost_set_vring_enable? (Despite its name, it's a
> network specific function; it should probably be called something like
> vhost_net_set_vring_enable.)
>
> With something like the below, net shouldn't see any difference any
> more, right? (Happy to add a comment before it if you write it for me.)
>

Early return should work but I'm not sure if that is the most
sustainable way. We should remove peer->type conditionals as much as
possible :).

To split vhost_ops would be better, yes. But these should be split
between the virtio vring enable vs vhost vring enable. The first one
does not accept a boolean, as it is only one direction. But this
option requires more changes, so I'm happy with the early return for
now. Not sure about Jason or MST.

As a draft for the comment:

Net vhost-vdpa devices need to dataplane virtqueues after DRIVER_OK,
so it can recover device state before starting dataplane. Because of
that, not enabling virtqueues here and leaving it to net/vhost-vdpa.c.
---

And maybe we can add another comment at vhost_dev_start doc? Something
in the line of:
* vring = yes means vhost_dev_start will start all vrings. vring =
false delegates vring initialization to the caller.

Does that make sense to you?

Thanks!

[1] 
https://lore.kernel.org/qemu-devel/0aae4d77-2c03-7ba2-8496-024b5a683449@redhat.com/

> Kevin
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e8e1661646..fb6d13bd69 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -543,6 +543,10 @@ int vhost_set_vring_enable(NetClientState *nc, int 
> enable)
>
>      nc->vring_enable = enable;
>
> +    if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> +        return 0;
> +    }
> +
>      if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
>          return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
>      }
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]