qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ


From: Jason Wang
Subject: Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Date: Fri, 17 Jun 2022 09:29:16 +0800

On Wed, Jun 15, 2022 at 6:03 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> 
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin 
> > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> 
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > > Control virtqueue is used by networking device for 
> > > > > > > > > > accepting various
> > > > > > > > > > commands from the driver. It's a must to support multiqueue 
> > > > > > > > > > and other
> > > > > > > > > > configurations.
> > > > > > > > > >
> > > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of 
> > > > > > > > > > virtqueue
> > > > > > > > > > states, effectively intercepting them so qemu can track 
> > > > > > > > > > what regions of memory
> > > > > > > > > > are dirty because device action and needs migration. 
> > > > > > > > > > However, this does not
> > > > > > > > > > solve networking device state seen by the driver because 
> > > > > > > > > > CVQ messages, like
> > > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > > >
> > > > > > > > > > To solve that, this series uses SVQ infraestructure 
> > > > > > > > > > proposed to intercept
> > > > > > > > > > networking control messages used by the device. This way, 
> > > > > > > > > > qemu is able to
> > > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > > >
> > > > > > > > > > However, to intercept all queues would slow device data 
> > > > > > > > > > forwarding. To solve
> > > > > > > > > > that, only the CVQ must be intercepted all the time. This 
> > > > > > > > > > is achieved using
> > > > > > > > > > the ASID infraestructure, that allows different 
> > > > > > > > > > translations for different
> > > > > > > > > > virtqueues. The most updated kernel part of ASID is 
> > > > > > > > > > proposed at [1].
> > > > > > > > > >
> > > > > > > > > > You can run qemu in two modes after applying this series: 
> > > > > > > > > > only intercepting
> > > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues 
> > > > > > > > > > adding cmdline x-svq=on:
> > > > > > > > > >
> > > > > > > > > > -netdev 
> > > > > > > > > > type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > > >
> > > > > > > > > > First three patches enable the update of the virtio-net 
> > > > > > > > > > device model for each
> > > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > > >
> > > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the 
> > > > > > > > > > buffers to QEMU's VA.
> > > > > > > > > > This allows simplyfing the memory mapping, instead of map 
> > > > > > > > > > all the guest's
> > > > > > > > > > memory like in the data virtqueues.
> > > > > > > > > >
> > > > > > > > > > Patch 10 allows to inject control messages to the device. 
> > > > > > > > > > This allows to set
> > > > > > > > > > state to the device both at QEMU startup and at live 
> > > > > > > > > > migration destination. In
> > > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > > >
> > > > > > > > > > Patch 11 updates kernel headers, but it assign random 
> > > > > > > > > > numbers to needed ioctls
> > > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > > >
> > > > > > > > > > Patches 12-16 enables the set of the features of the net 
> > > > > > > > > > device model to the
> > > > > > > > > > vdpa device at device start.
> > > > > > > > > >
> > > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > > >
> > > > > > > > > > Comments are welcomed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > As discussed, I think we need to split this huge series into 
> > > > > > > > > smaller ones:
> > > > > > > > >
> > > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > > 2) ASID support for CVQ
> > > > > > > > >
> > > > > > > > > And for 1) we need consider whether or not it could be 
> > > > > > > > > simplified.
> > > > > > > > >
> > > > > > > > > Or do it in reverse order, since if we do 1) first, we may 
> > > > > > > > > have security
> > > > > > > > > issues.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'm ok with both, but I also think 2) before 1) might make more 
> > > > > > > > sense.
> > > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > > >
> > > > > > >
> > > > > > > On second thought, that order is kind of harder.
> > > > > > >
> > > > > > > If we only map CVQ buffers, we need to either:
> > > > > > > a. Copy them to controlled buffers
> > > > > > > b. Track properly when to unmap them
> > > > > >
> > > > > > Just to make sure we're at the same page:
> > > > > >
> > > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > > still using CVQ passthrough.
> > > > > >
> > > > >
> > > > > That would imply duplicating all the memory listener updates to both
> > > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > > but I'm not sure if it's worth it to do it that way.
> > > >
> > > > I don't get why it is related to memory listeners. The only change is
> > > >
> > > > 1) read the groups
> > > > 2) set cvq to be an independent asid
> > > > 3) update CVQ's IOTLB with its own ASID
> > > >
> > >
> > > How to track the mappings of step 3) without a copy?
> >
> > So let me try to explain, what I propose is to split the patches. So
> > the above could be the first part. Since we know:
> >
> > 1) CVQ is passthrough to guest right now
> > 2) We know CVQ will use an independent ASID
> >
> > It doesn't harm to implement those first. It's unrelated to the policy
> > (e.g how to shadow CVQ).
> >
> > >
> > > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > > so we need to refcount them (or similar solution).
> >
> > Can we use fixed mapping instead of the dynamic ones?
> >
>
> That implies either to implement something like a memory ring (size?),
> or to effectively duplicate memory listener mappings.

I'm not sure I get this.

But it's mainly the CVQ buffer + CVQ virtqueue.

It should be possible if:

1) allocate something like a buffer of several megabytes
2) only process one CVQ command from guest at once

?

>
> I'm not against that, but it's something we need to remove on the
> final solution. To use the order presented here will avoid that.
>
> > >
> > > This series copies the buffers to an independent buffer in qemu memory
> > > to avoid that. Once you copy them, we have the problem you point at
> > > some patch later: The guest control buffers, so qemu must understand
> > > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > > At that point, we have this series except for the asid part and the
> > > injection of CVQ buffers at the LM destination, isn't it?
> >
> > So we have several stuffs:
> >
> > 1) ASID support
> > 2) Shadow CVQ only
> > 3) State restoring
> >
> > I hope we can split them into independent series. If we want to shadow
> > CVQ first, we need to prove that it is safe without ASID.
> >
> > >
> > > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > > device. Much like SVQ vrings, the copied buffers will not be
> > > accessible from the guest. The hw device (as "non emulated cvq") will
> > > receive a lot of dma updates, but it's temporary. We can add ASID on
> > > top of that as a mean to:
> > > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > > - Not to pollute data plane DMA mappings.
> > >
> > > > ?
> > > >
> > > > >
> > > > > > Then do other stuff on top.
> > > > > >
> > > > > > >
> > > > > > > Alternative a. have the same problems exposed in this RFC: It's 
> > > > > > > hard
> > > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > > Alternative b. also requires things not needed in the final 
> > > > > > > version,
> > > > > > > like to count the number of times each page is mapped and 
> > > > > > > unmapped.
> > > > > > >
> > > > > > > So I'll go to the first alternative, that is also the proposed 
> > > > > > > order
> > > > > > > of the RFC. What security issues do you expect beyond the 
> > > > > > > comments in
> > > > > > > this series?
> > > > > >
> > > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ 
> > > > > > and
> > > > > > try to peek/modify it?
> > > > > >
> > > > >
> > > > > It works the same way as data vqs, we're just updating the device
> > > > > model in the middle. It should imply the exact same risk as updating
> > > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > > >
> > > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > > is owned by guests.
> > > >
> > >
> > > The same way they control the data plane when all data virtqueues are
> > > shadowed for dirty page tracking (more on the risk of qemu updating
> > > the device model below).
> >
> > Ok.
> >
> > >
> > > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > > there's no way to prevent guests from accessing it?
> > > >
> > >
> > > With SVQ the memory exposed to the device is already shadowed. They
> > > cannot access the CVQ buffers memory the same way they cannot access
> > > the SVQ vrings.
> >
> > Ok, I think I kind of get you, it looks like we have different
> > assumptions here: So if we only shadow CVQ, it will have security
> > issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> > TX/RX, there's no security issue, since each IOVA is validated and the
> > descriptors are prepared by Qemu.
> >
>
> Right. I expected to maintain the all-shadowed-or-nothing behavior,
> sorry if I was not clear.
>
> > This goes back to another question, what's the order of the series.
> >
>
> I think that the shortest path is to follow the order of this series.
> I tried to reorder your way, but ASID patches have to come with a lot
> of CVQ patches if we want proper validation.

Ok, so if this is the case, let's just split this series and keep the order.

>
> We can take the long route if we either implement a fixed ring buffer,
> memory listener cloning, or another use case (sub-slicing?). But I
> expect more issues to arise there.
>
> I have another question actually, is it ok to implement the cvq use
> case but not to merge the x-svq parameter? The more I think on the
> parameter the more I see it's better to leave it as a separated patch
> for testing until we shape the complete series and it's unneeded.

That's fine.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > > to exploit buffers owned by Qemu, it should be a bug.
> > > >
> > >
> > > The only extra step is the call to virtio_net_handle_ctrl_iov
> > > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > > in SVQ mode, it can exploit it too with other vhost backends as far as
> > > I see.
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > > comments. That already meets use cases like rx filter notifications
> > > > > for devices with only one ASID.
> > > > >
> > >
> > > This part of my mail is not correct, we need to add a few patches of
> > > this series on top :). If not, it would be exploitable.
> > >
> > > Thanks!
> > >
> > > > > Thanks!
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > > Can we do as with previous base SVQ patches? they were merged 
> > > > > > > > although
> > > > > > > > there is still no way to enable SVQ.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > TODO:
> > > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own 
> > > > > > > > > > ASID by any
> > > > > > > > > >    reason, blocking migration. This is tricky, since it can 
> > > > > > > > > > cause that the VM
> > > > > > > > > >    cannot be migrated anymore, so some way of block it must 
> > > > > > > > > > be used.
> > > > > > > > > > * Review failure paths, some are with TODO notes, other 
> > > > > > > > > > don't.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v7:
> > > > > > > > > > * Don't map all guest space in ASID 1 but copy all the 
> > > > > > > > > > buffers. No need for
> > > > > > > > > >    more memory listeners.
> > > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > > * Wait for device CVQ commands used by the device at SVQ 
> > > > > > > > > > start, avoiding races.
> > > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > > * Reorder commits so refactor and code adding ones are 
> > > > > > > > > > closer to usage.
> > > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v6:
> > > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > > * Add reference counting to iova_tree so cleaning is 
> > > > > > > > > > simpler.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v5:
> > > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked 
> > > > > > > > > > by the guest.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v4:
> > > > > > > > > > * Add missing tracing
> > > > > > > > > > * Add multiqueue support
> > > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > > * Care with memory management
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v3:
> > > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v2:
> > > > > > > > > > * Fix use-after-free.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v1:
> > > > > > > > > > * Rebase to latest master.
> > > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs 
> > > > > > > > > > asid.
> > > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > > >
> > > > > > > > > > [1] 
> > > > > > > > > > https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > > >
> > > > > > > > > > Eugenio Pérez (21):
> > > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > > >    vhost: move descriptor translation to 
> > > > > > > > > > vhost_svq_vring_write_descs
> > > > > > > > > >    vhost: Add SVQElement
> > > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > > >    vhost: Update kernel headers
> > > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > > >    vhost: Make possible to check for device exclusive vq 
> > > > > > > > > > group
> > > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > > >    vdpa: Extract get features part from 
> > > > > > > > > > vhost_vdpa_get_max_queue_pairs
> > > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > > >
> > > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 
> > > > > > > > > > ++++++++++++++++---
> > > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 
> > > > > > > > > > +++++++++-
> > > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > > >   net/vhost-vdpa.c                             | 294 
> > > > > > > > > > ++++++++++++++-
> > > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]