qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V6 4/5] pvrdma: initial implementation


From: Yuval Shaia
Subject: Re: [Qemu-devel] [PATCH V6 4/5] pvrdma: initial implementation
Date: Tue, 9 Jan 2018 13:08:33 +0200
User-agent: Mutt/1.9.1 (2017-09-22)

On Tue, Jan 09, 2018 at 11:39:11AM +0100, Cornelia Huck wrote:
> On Sun,  7 Jan 2018 14:32:23 +0200
> Marcel Apfelbaum <address@hidden> wrote:
> 
> > From: Yuval Shaia <address@hidden>
> > 
> > PVRDMA is the QEMU implementation of VMware's paravirtualized RDMA device.
> > It works with its Linux Kernel driver AS IS, no need for any special guest
> > modifications.
> > 
> > While it complies with the VMware device, it can also communicate with bare
> > metal RDMA-enabled machines and does not require an RDMA HCA in the host, it
> > can work with Soft-RoCE (rxe).
> > 
> > It does not require the whole guest RAM to be pinned allowing memory
> > over-commit and, even if not implemented yet, migration support will be
> > possible with some HW assistance.
> > 
> > Signed-off-by: Yuval Shaia <address@hidden>
> > Signed-off-by: Marcel Apfelbaum <address@hidden>
> > ---
> >  Makefile.objs                      |   2 +
> >  configure                          |   9 +-
> >  default-configs/arm-softmmu.mak    |   1 +
> >  default-configs/i386-softmmu.mak   |   1 +
> >  default-configs/x86_64-softmmu.mak |   1 +
> >  hw/Makefile.objs                   |   1 +
> >  hw/rdma/Makefile.objs              |   6 +
> >  hw/rdma/rdma_backend.c             | 815 
> > +++++++++++++++++++++++++++++++++++++
> >  hw/rdma/rdma_backend.h             |  92 +++++
> >  hw/rdma/rdma_backend_defs.h        |  62 +++
> >  hw/rdma/rdma_rm.c                  | 619 ++++++++++++++++++++++++++++
> >  hw/rdma/rdma_rm.h                  |  69 ++++
> >  hw/rdma/rdma_rm_defs.h             | 106 +++++
> >  hw/rdma/rdma_utils.c               |  52 +++
> >  hw/rdma/rdma_utils.h               |  43 ++
> >  hw/rdma/trace-events               |   5 +
> >  hw/rdma/vmw/pvrdma.h               | 122 ++++++
> >  hw/rdma/vmw/pvrdma_cmd.c           | 679 ++++++++++++++++++++++++++++++
> >  hw/rdma/vmw/pvrdma_dev_api.h       | 602 +++++++++++++++++++++++++++
> >  hw/rdma/vmw/pvrdma_dev_ring.c      | 139 +++++++
> >  hw/rdma/vmw/pvrdma_dev_ring.h      |  42 ++
> >  hw/rdma/vmw/pvrdma_ib_verbs.h      | 433 ++++++++++++++++++++
> >  hw/rdma/vmw/pvrdma_main.c          | 644 +++++++++++++++++++++++++++++
> >  hw/rdma/vmw/pvrdma_qp_ops.c        | 212 ++++++++++
> >  hw/rdma/vmw/pvrdma_qp_ops.h        |  27 ++
> >  hw/rdma/vmw/pvrdma_ring.h          | 134 ++++++
> >  hw/rdma/vmw/trace-events           |   5 +
> >  hw/rdma/vmw/vmw_pvrdma-abi.h       | 311 ++++++++++++++
> >  include/hw/pci/pci_ids.h           |   3 +
> >  29 files changed, 5233 insertions(+), 4 deletions(-)
> >  create mode 100644 hw/rdma/Makefile.objs
> >  create mode 100644 hw/rdma/rdma_backend.c
> >  create mode 100644 hw/rdma/rdma_backend.h
> >  create mode 100644 hw/rdma/rdma_backend_defs.h
> >  create mode 100644 hw/rdma/rdma_rm.c
> >  create mode 100644 hw/rdma/rdma_rm.h
> >  create mode 100644 hw/rdma/rdma_rm_defs.h
> >  create mode 100644 hw/rdma/rdma_utils.c
> >  create mode 100644 hw/rdma/rdma_utils.h
> >  create mode 100644 hw/rdma/trace-events
> >  create mode 100644 hw/rdma/vmw/pvrdma.h
> >  create mode 100644 hw/rdma/vmw/pvrdma_cmd.c
> >  create mode 100644 hw/rdma/vmw/pvrdma_dev_api.h
> >  create mode 100644 hw/rdma/vmw/pvrdma_dev_ring.c
> >  create mode 100644 hw/rdma/vmw/pvrdma_dev_ring.h
> >  create mode 100644 hw/rdma/vmw/pvrdma_ib_verbs.h
> >  create mode 100644 hw/rdma/vmw/pvrdma_main.c
> >  create mode 100644 hw/rdma/vmw/pvrdma_qp_ops.c
> >  create mode 100644 hw/rdma/vmw/pvrdma_qp_ops.h
> >  create mode 100644 hw/rdma/vmw/pvrdma_ring.h
> >  create mode 100644 hw/rdma/vmw/trace-events
> >  create mode 100644 hw/rdma/vmw/vmw_pvrdma-abi.h
> 
> (...)
> 
> > diff --git a/default-configs/arm-softmmu.mak 
> > b/default-configs/arm-softmmu.mak
> > index b0d6e65038..0e7a3c1700 100644
> > --- a/default-configs/arm-softmmu.mak
> > +++ b/default-configs/arm-softmmu.mak
> > @@ -132,3 +132,4 @@ CONFIG_GPIO_KEY=y
> >  CONFIG_MSF2=y
> >  CONFIG_FW_CFG_DMA=y
> >  CONFIG_XILINX_AXI=y
> > +CONFIG_PVRDMA=y
> > diff --git a/default-configs/i386-softmmu.mak 
> > b/default-configs/i386-softmmu.mak
> > index 95ac4b464a..88298e4ef5 100644
> > --- a/default-configs/i386-softmmu.mak
> > +++ b/default-configs/i386-softmmu.mak
> > @@ -61,3 +61,4 @@ CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
> >  CONFIG_PXB=y
> >  CONFIG_ACPI_VMGENID=y
> >  CONFIG_FW_CFG_DMA=y
> > +CONFIG_PVRDMA=y
> > diff --git a/default-configs/x86_64-softmmu.mak 
> > b/default-configs/x86_64-softmmu.mak
> > index 0221236825..f571da36eb 100644
> > --- a/default-configs/x86_64-softmmu.mak
> > +++ b/default-configs/x86_64-softmmu.mak
> > @@ -61,3 +61,4 @@ CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
> >  CONFIG_PXB=y
> >  CONFIG_ACPI_VMGENID=y
> >  CONFIG_FW_CFG_DMA=y
> > +CONFIG_PVRDMA=y
> 
> Any reason you did not add this to other architectures?
> 
> I added "CONFIG_PVRDMA=$(CONFIG_PCI)" to s390x-softmmu.mak, and it at
> least builds (did not try to actually get it to work, although I don't
> see any immediate blocker for that).
> 
> (...)
> 
> > diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
> > new file mode 100644
> > index 0000000000..dcb799f49b
> > --- /dev/null
> > +++ b/hw/rdma/rdma_backend.c
> 
> (...)
> 
> > +static void poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq,
> > +                    bool one_poll)
> > +{
> > +    int i, ne;
> > +    BackendCtx *bctx;
> > +    struct ibv_wc wc[2];
> > +
> > +    pr_dbg("Entering poll_cq loop on cq %p\n", ibcq);
> > +    do {
> > +        ne = ibv_poll_cq(ibcq, 2, wc);
> > +        if (ne == 0 && one_poll) {
> > +            pr_dbg("CQ is empty\n");
> > +            return;
> > +        }
> > +    } while (ne < 0);
> > +
> > +    pr_dbg("Got %d completion(s) from cq %p\n", ne, ibcq);
> > +
> > +    for (i = 0; i < ne; i++) {
> > +        pr_dbg("wr_id=0x%lx\n", wc[i].wr_id);
> > +        pr_dbg("status=%d\n", wc[i].status);
> > +
> > +        bctx = rdma_rm_get_cqe_ctx(rdma_dev_res, wc[i].wr_id);
> > +        if (unlikely(!bctx)) {
> > +            pr_dbg("Error: Fail to find ctx for req %ld\n", wc[i].wr_id);
> 
> s/Fail/Failed/
> 
> (A lot of these through out the various files. Just thought I'd point
> that out; but I don't really have time to do a real review.)

Thanks!

> 
> > +            continue;
> > +        }
> > +        pr_dbg("Processing %s CQE\n", bctx->is_tx_req ? "send" : "recv");
> > +
> > +        comp_handler(wc[i].status, wc[i].vendor_err, bctx->up_ctx);
> > +
> > +        rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id);
> > +        free(bctx);
> > +    }
> > +}
> 
> (...)
> 
> > diff --git a/hw/rdma/vmw/pvrdma_dev_api.h b/hw/rdma/vmw/pvrdma_dev_api.h
> > new file mode 100644
> > index 0000000000..bf1986a976
> > --- /dev/null
> > +++ b/hw/rdma/vmw/pvrdma_dev_api.h
> > @@ -0,0 +1,602 @@
> > +/*
> > + * QEMU VMWARE paravirtual RDMA device definitions
> > + *
> > + * Copyright (C) 2018 Oracle
> > + * Copyright (C) 2018 Red Hat Inc
> > + *
> > + * Authors:
> > + *     Yuval Shaia <address@hidden>
> > + *     Marcel Apfelbaum <address@hidden>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef PVRDMA_DEV_API_H
> > +#define PVRDMA_DEV_API_H
> > +
> > +/*
> > + * Following is an interface definition for PVRDMA device as provided by
> > + * VMWARE.
> > + * See original copyright from Linux kernel v4.14.5 header file
> > + * drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h
> 
> Could that file be exported as UAPI in the kernel and added to the
> linux-headers script?

We took this approach as apposed to kernel-headers with the following on
our mind:
(1) This is the convention used in vmxnet3.
(2) vmw_pvrdma was introduced only lately, taking the kernel-headers
approach will force specific kernel on a host in order to compile QEMU.
(3) To support VMWare's pvrdma device we took a snapshot of existing
driver/device settings and breezed there. This is driver/device API and we
can't allow our self to chase VMWare's tail whenever they are changing the
API. Just consider a case where they will change for example the ARM bit.

Just IMHO.

> 
> (...)
> 
> > diff --git a/hw/rdma/vmw/pvrdma_ib_verbs.h b/hw/rdma/vmw/pvrdma_ib_verbs.h
> > new file mode 100644
> > index 0000000000..cf1430024b
> > --- /dev/null
> > +++ b/hw/rdma/vmw/pvrdma_ib_verbs.h
> > @@ -0,0 +1,433 @@
> > +/*
> > + * QEMU VMWARE paravirtual RDMA device definitions
> > + *
> > + * Copyright (C) 2018 Oracle
> > + * Copyright (C) 2018 Red Hat Inc
> > + *
> > + * Authors:
> > + *     Yuval Shaia <address@hidden>
> > + *     Marcel Apfelbaum <address@hidden>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef PVRDMA_IB_VERBS_H
> > +#define PVRDMA_IB_VERBS_H
> > +
> > +/*
> > + * VMWARE headers we got from Linux kernel do not fully comply QEMU coding
> > + * standards in sense of types and defines used.
> > + * Since we didn't want to change VMWARE code, following set of typedefs
> > + * and defines needed to compile these headers with QEMU introduced.
> > + */
> > +
> > +#define u8     uint8_t
> > +#define u16    unsigned short
> > +#define u32    uint32_t
> > +#define u64    uint64_t
> 
> I think the headers update already takes care of some conversions.
> Otherwise, same comment as for the header above.

Sorry, i'm not following, can you elaborate on that?

> 
> > +
> > +/*
> > + * Following is an interface definition for PVRDMA device as provided by
> > + * VMWARE.
> > + * See original copyright from Linux kernel v4.14.5 header file
> > + * drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
> > + */
> 
> (...)
> 
> > diff --git a/hw/rdma/vmw/vmw_pvrdma-abi.h b/hw/rdma/vmw/vmw_pvrdma-abi.h
> > new file mode 100644
> > index 0000000000..8cfb9d7745
> > --- /dev/null
> > +++ b/hw/rdma/vmw/vmw_pvrdma-abi.h
> > @@ -0,0 +1,311 @@
> > +/*
> > + * QEMU VMWARE paravirtual RDMA device definitions
> > + *
> > + * Copyright (C) 2018 Oracle
> > + * Copyright (C) 2018 Red Hat Inc
> > + *
> > + * Authors:
> > + *     Yuval Shaia <address@hidden>
> > + *     Marcel Apfelbaum <address@hidden>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef VMW_PVRDMA_ABI_H
> > +#define VMW_PVRDMA_ABI_H
> > +
> > +/*
> > + * Following is an interface definition for PVRDMA device as provided by
> > + * VMWARE.
> > + * See original copyright from Linux kernel v4.14.5 header file
> > + * include/uapi/rdma/vmw_pvrdma-abi.h
> > + */
> 
> This one is already exported.

Same argument as above.

Yuval



reply via email to

[Prev in Thread] Current Thread [Next in Thread]