[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device
From: |
Leon Romanovsky |
Subject: |
Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device |
Date: |
Thu, 30 Mar 2017 17:13:14 +0300 |
User-agent: |
Mutt/1.8.0 (2017-02-23) |
On Thu, Mar 30, 2017 at 02:12:21PM +0300, Marcel Apfelbaum wrote:
> From: Yuval Shaia <address@hidden>
>
> Hi,
>
> General description
> ===================
> This is a very early RFC of a new RoCE emulated device
> that enables guests to use the RDMA stack without having
> a real hardware in the host.
>
> The current implementation supports only VM to VM communication
> on the same host.
> Down the road we plan to make possible to be able to support
> inter-machine communication by utilizing physical RoCE devices
> or Soft RoCE.
>
> The goals are:
> - Reach fast and secure loos-less Inter-VM data exchange.
> - Support remote VMs or bare metal machines.
> - Allow VMs migration.
> - Do not require to pin all VM memory.
>
>
> Objective
> =========
> Have a QEMU implementation of the PVRDMA device. We aim to do so without
> any change in the PVRDMA guest driver which is already merged into the
> upstream kernel.
>
>
> RFC status
> ===========
> The project is in early development stages and supports
> only basic send/receive operations.
>
> We present it so we can get feedbacks on design,
> feature demands and to receive comments from the
> community pointing us to the "right" direction.
If to judge by the feedback which you got from RDMA community
for kernel proposal [1], this community failed to understand:
1. Why do you need new module?
2. Why existing solutions are not enough and can't be extended?
3. Why RXE (SoftRoCE) can't be extended to perform this inter-VM
communication via virtual NIC?
Can you please help us to fill this knowledge gap?
[1] http://marc.info/?l=linux-rdma&m=149063626907175&w=2
Thanks
>
> What does work:
> - Tested with a basic unit-test:
> - https://github.com/yuvalshaia/kibpingpong .
> It works fine with two devices on a single VM, has
> some issue between two VMs in the same host.
>
>
> Design
> ======
> - Follows the behavior of VMware's pvrdma device, however is not tightly
> coupled with it and most of the code can be reused if we decide to
> continue to a Virtio based RDMA device.
>
> - It exposes 3 BARs:
> BAR 0 - MSIX, utilize 3 vectors for command ring, async events and
> completions
> BAR 1 - Configuration of registers
> BAR 2 - UAR, used to pass HW commands from driver.
>
> - The device performs internal management of the RDMA
> resources (PDs, CQs, QPs, ...), meaning the objects
> are not directly coupled to a physical RDMA device resources.
>
> - As backend, the pvrdma device uses KDBR, a new kernel module which
> is also in RFC phase, read more on the linux-rdma list:
> - https://www.spinics.net/lists/linux-rdma/msg47951.html
>
> - All RDMA operations are converted to KDBR module calls which performs
> the actual transfer between VMs, or, in the future,
> will utilize a RoCE device (either physical or soft) to be able
> to communicate with another host.
>
>
> Roadmap (out of order)
> ======================
> - Utilize the RoCE host driver in order to support peers on external hosts.
> - Re-use the code for a virtio based device.
>
> Any ideas, comments or suggestions would be highly appreciated.
>
> Thanks,
> Yuval Shaia & Marcel Apfelbaum
>
> Signed-off-by: Yuval Shaia <address@hidden>
> (Mainly design, coding was done by Yuval)
> Signed-off-by: Marcel Apfelbaum <address@hidden>
>
> ---
> hw/net/Makefile.objs | 5 +
> hw/net/pvrdma/kdbr.h | 104 +++++++
> hw/net/pvrdma/pvrdma-uapi.h | 261 ++++++++++++++++
> hw/net/pvrdma/pvrdma.h | 155 ++++++++++
> hw/net/pvrdma/pvrdma_cmd.c | 322 +++++++++++++++++++
> hw/net/pvrdma/pvrdma_defs.h | 301 ++++++++++++++++++
> hw/net/pvrdma/pvrdma_dev_api.h | 342 ++++++++++++++++++++
> hw/net/pvrdma/pvrdma_ib_verbs.h | 469 ++++++++++++++++++++++++++++
> hw/net/pvrdma/pvrdma_kdbr.c | 395 ++++++++++++++++++++++++
> hw/net/pvrdma/pvrdma_kdbr.h | 53 ++++
> hw/net/pvrdma/pvrdma_main.c | 667
> ++++++++++++++++++++++++++++++++++++++++
> hw/net/pvrdma/pvrdma_qp_ops.c | 174 +++++++++++
> hw/net/pvrdma/pvrdma_qp_ops.h | 25 ++
> hw/net/pvrdma/pvrdma_ring.c | 127 ++++++++
> hw/net/pvrdma/pvrdma_ring.h | 43 +++
> hw/net/pvrdma/pvrdma_rm.c | 529 +++++++++++++++++++++++++++++++
> hw/net/pvrdma/pvrdma_rm.h | 214 +++++++++++++
> hw/net/pvrdma/pvrdma_types.h | 37 +++
> hw/net/pvrdma/pvrdma_utils.c | 36 +++
> hw/net/pvrdma/pvrdma_utils.h | 49 +++
> include/hw/pci/pci_ids.h | 3 +
> 21 files changed, 4311 insertions(+)
> create mode 100644 hw/net/pvrdma/kdbr.h
> create mode 100644 hw/net/pvrdma/pvrdma-uapi.h
> create mode 100644 hw/net/pvrdma/pvrdma.h
> create mode 100644 hw/net/pvrdma/pvrdma_cmd.c
> create mode 100644 hw/net/pvrdma/pvrdma_defs.h
> create mode 100644 hw/net/pvrdma/pvrdma_dev_api.h
> create mode 100644 hw/net/pvrdma/pvrdma_ib_verbs.h
> create mode 100644 hw/net/pvrdma/pvrdma_kdbr.c
> create mode 100644 hw/net/pvrdma/pvrdma_kdbr.h
> create mode 100644 hw/net/pvrdma/pvrdma_main.c
> create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.c
> create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.h
> create mode 100644 hw/net/pvrdma/pvrdma_ring.c
> create mode 100644 hw/net/pvrdma/pvrdma_ring.h
> create mode 100644 hw/net/pvrdma/pvrdma_rm.c
> create mode 100644 hw/net/pvrdma/pvrdma_rm.h
> create mode 100644 hw/net/pvrdma/pvrdma_types.h
> create mode 100644 hw/net/pvrdma/pvrdma_utils.c
> create mode 100644 hw/net/pvrdma/pvrdma_utils.h
>
> diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
> index 610ed3e..a962347 100644
> --- a/hw/net/Makefile.objs
> +++ b/hw/net/Makefile.objs
> @@ -43,3 +43,8 @@ common-obj-$(CONFIG_ROCKER) += rocker/rocker.o
> rocker/rocker_fp.o \
> rocker/rocker_desc.o rocker/rocker_world.o \
> rocker/rocker_of_dpa.o
> obj-$(call lnot,$(CONFIG_ROCKER)) += rocker/qmp-norocker.o
> +
> +obj-$(CONFIG_PCI) += pvrdma/pvrdma_ring.o pvrdma/pvrdma_rm.o \
> + pvrdma/pvrdma_utils.o pvrdma/pvrdma_qp_ops.o \
> + pvrdma/pvrdma_kdbr.o pvrdma/pvrdma_cmd.o \
> + pvrdma/pvrdma_main.o
> diff --git a/hw/net/pvrdma/kdbr.h b/hw/net/pvrdma/kdbr.h
> new file mode 100644
> index 0000000..97cb93c
> --- /dev/null
> +++ b/hw/net/pvrdma/kdbr.h
> @@ -0,0 +1,104 @@
> +/*
> + * Kernel Data Bridge driver - API
> + *
> + * Copyright 2016 Red Hat, Inc.
> + * Copyright 2016 Oracle
> + *
> + * Authors:
> + * Marcel Apfelbaum <address@hidden>
> + * Yuval Shaia <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef _KDBR_H
> +#define _KDBR_H
> +
> +#ifdef __KERNEL__
> +#include <linux/uio.h>
> +#define KDBR_MAX_IOVEC_LEN UIO_FASTIOV
> +#else
> +#include <sys/uio.h>
> +#define KDBR_MAX_IOVEC_LEN 8
> +#endif
> +
> +#define KDBR_FILE_NAME "/dev/kdbr"
> +#define KDBR_MAX_PORTS 255
> +
> +#define KDBR_IOC_MAGIC 0xBA
> +
> +#define KDBR_REGISTER_PORT _IOWR(KDBR_IOC_MAGIC, 0, struct kdbr_reg)
> +#define KDBR_UNREGISTER_PORT _IOW(KDBR_IOC_MAGIC, 1, int)
> +#define KDBR_IOC_MAX 2
> +
> +
> +enum kdbr_ack_type {
> + KDBR_ACK_IMMEDIATE,
> + KDBR_ACK_DELAYED,
> +};
> +
> +struct kdbr_gid {
> + unsigned long net_id;
> + unsigned long id;
> +};
> +
> +struct kdbr_peer {
> + struct kdbr_gid rgid;
> + unsigned long rqueue;
> +};
> +
> +struct list_head;
> +struct mutex;
> +struct kdbr_connection {
> + unsigned long queue_id;
> + struct kdbr_peer peer;
> + enum kdbr_ack_type ack_type;
> + /* TODO: hide the below fields in the .c file */
> + struct list_head *sg_vecs_list;
> + struct mutex *sg_vecs_mutex;
> +};
> +
> +struct kdbr_reg {
> + struct kdbr_gid gid; /* in */
> + int port; /* out */
> +};
> +
> +#define KDBR_REQ_SIGNATURE 0x000000AB
> +#define KDBR_REQ_POST_RECV 0x00000100
> +#define KDBR_REQ_POST_SEND 0x00000200
> +#define KDBR_REQ_POST_MREG 0x00000300
> +#define KDBR_REQ_POST_RDMA 0x00000400
> +
> +struct kdbr_req {
> + unsigned int flags; /* 8 bits signature, 8 bits msg_type */
> + struct iovec vec[KDBR_MAX_IOVEC_LEN];
> + int vlen; /* <= KDBR_MAX_IOVEC_LEN */
> + int connection_id;
> + struct kdbr_peer peer;
> + unsigned long req_id;
> +};
> +
> +#define KDBR_ERR_CODE_EMPTY_VEC 0x101
> +#define KDBR_ERR_CODE_NO_MORE_RECV_BUF 0x102
> +#define KDBR_ERR_CODE_RECV_BUF_PROT 0x103
> +#define KDBR_ERR_CODE_INV_ADDR 0x104
> +#define KDBR_ERR_CODE_INV_CONN_ID 0x105
> +#define KDBR_ERR_CODE_NO_PEER 0x106
> +
> +struct kdbr_completion {
> + int connection_id;
> + unsigned long req_id;
> + int status; /* 0 = Success */
> +};
> +
> +#define KDBR_PORT_IOC_MAGIC 0xBB
> +
> +#define KDBR_PORT_OPEN_CONN _IOR(KDBR_PORT_IOC_MAGIC, 0, \
> + struct kdbr_connection)
> +#define KDBR_PORT_CLOSE_CONN _IOR(KDBR_PORT_IOC_MAGIC, 1, int)
> +#define KDBR_PORT_IOC_MAX 4
> +
> +#endif
> +
> diff --git a/hw/net/pvrdma/pvrdma-uapi.h b/hw/net/pvrdma/pvrdma-uapi.h
> new file mode 100644
> index 0000000..0045776
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma-uapi.h
> @@ -0,0 +1,261 @@
> +/*
> + * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of EITHER the GNU General Public License
> + * version 2 as published by the Free Software Foundation or the BSD
> + * 2-Clause License. This program is distributed in the hope that it
> + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
> + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU General Public License version 2 for more details at
> + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program available in the file COPYING in the main
> + * directory of this source tree.
> + *
> + * The BSD 2-Clause License
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef PVRDMA_UAPI_H
> +#define PVRDMA_UAPI_H
> +
> +#include "qemu/osdep.h"
> +#include "qemu/cutils.h"
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +#include <qemu/compiler.h>
> +#include <qemu/atomic.h>
> +
> +#define PVRDMA_VERSION 17
> +
> +#define PVRDMA_UAR_HANDLE_MASK 0x00FFFFFF /* Bottom 24 bits. */
> +#define PVRDMA_UAR_QP_OFFSET 0 /* Offset of QP doorbell. */
> +#define PVRDMA_UAR_QP_SEND BIT(30) /* Send bit. */
> +#define PVRDMA_UAR_QP_RECV BIT(31) /* Recv bit. */
> +#define PVRDMA_UAR_CQ_OFFSET 4 /* Offset of CQ doorbell. */
> +#define PVRDMA_UAR_CQ_ARM_SOL BIT(29) /* Arm solicited bit. */
> +#define PVRDMA_UAR_CQ_ARM BIT(30) /* Arm bit. */
> +#define PVRDMA_UAR_CQ_POLL BIT(31) /* Poll bit. */
> +#define PVRDMA_INVALID_IDX -1 /* Invalid index. */
> +
> +/* PVRDMA atomic compare and swap */
> +struct pvrdma_exp_cmp_swap {
> + __u64 swap_val;
> + __u64 compare_val;
> + __u64 swap_mask;
> + __u64 compare_mask;
> +};
> +
> +/* PVRDMA atomic fetch and add */
> +struct pvrdma_exp_fetch_add {
> + __u64 add_val;
> + __u64 field_boundary;
> +};
> +
> +/* PVRDMA address vector. */
> +struct pvrdma_av {
> + __u32 port_pd;
> + __u32 sl_tclass_flowlabel;
> + __u8 dgid[16];
> + __u8 src_path_bits;
> + __u8 gid_index;
> + __u8 stat_rate;
> + __u8 hop_limit;
> + __u8 dmac[6];
> + __u8 reserved[6];
> +};
> +
> +/* PVRDMA scatter/gather entry */
> +struct pvrdma_sge {
> + __u64 addr;
> + __u32 length;
> + __u32 lkey;
> +};
> +
> +/* PVRDMA receive queue work request */
> +struct pvrdma_rq_wqe_hdr {
> + __u64 wr_id; /* wr id */
> + __u32 num_sge; /* size of s/g array */
> + __u32 total_len; /* reserved */
> +};
> +/* Use pvrdma_sge (ib_sge) for receive queue s/g array elements. */
> +
> +/* PVRDMA send queue work request */
> +struct pvrdma_sq_wqe_hdr {
> + __u64 wr_id; /* wr id */
> + __u32 num_sge; /* size of s/g array */
> + __u32 total_len; /* reserved */
> + __u32 opcode; /* operation type */
> + __u32 send_flags; /* wr flags */
> + union {
> + __u32 imm_data;
> + __u32 invalidate_rkey;
> + } ex;
> + __u32 reserved;
> + union {
> + struct {
> + __u64 remote_addr;
> + __u32 rkey;
> + __u8 reserved[4];
> + } rdma;
> + struct {
> + __u64 remote_addr;
> + __u64 compare_add;
> + __u64 swap;
> + __u32 rkey;
> + __u32 reserved;
> + } atomic;
> + struct {
> + __u64 remote_addr;
> + __u32 log_arg_sz;
> + __u32 rkey;
> + union {
> + struct pvrdma_exp_cmp_swap cmp_swap;
> + struct pvrdma_exp_fetch_add fetch_add;
> + } wr_data;
> + } masked_atomics;
> + struct {
> + __u64 iova_start;
> + __u64 pl_pdir_dma;
> + __u32 page_shift;
> + __u32 page_list_len;
> + __u32 length;
> + __u32 access_flags;
> + __u32 rkey;
> + } fast_reg;
> + struct {
> + __u32 remote_qpn;
> + __u32 remote_qkey;
> + struct pvrdma_av av;
> + } ud;
> + } wr;
> +};
> +/* Use pvrdma_sge (ib_sge) for send queue s/g array elements. */
> +
> +/* Completion queue element. */
> +struct pvrdma_cqe {
> + __u64 wr_id;
> + __u64 qp;
> + __u32 opcode;
> + __u32 status;
> + __u32 byte_len;
> + __u32 imm_data;
> + __u32 src_qp;
> + __u32 wc_flags;
> + __u32 vendor_err;
> + __u16 pkey_index;
> + __u16 slid;
> + __u8 sl;
> + __u8 dlid_path_bits;
> + __u8 port_num;
> + __u8 smac[6];
> + __u8 reserved2[7]; /* Pad to next power of 2 (64). */
> +};
> +
> +struct pvrdma_ring {
> + int prod_tail; /* Producer tail. */
> + int cons_head; /* Consumer head. */
> +};
> +
> +struct pvrdma_ring_state {
> + struct pvrdma_ring tx; /* Tx ring. */
> + struct pvrdma_ring rx; /* Rx ring. */
> +};
> +
> +static inline int pvrdma_idx_valid(__u32 idx, __u32 max_elems)
> +{
> + /* Generates fewer instructions than a less-than. */
> + return (idx & ~((max_elems << 1) - 1)) == 0;
> +}
> +
> +static inline __s32 pvrdma_idx(int *var, __u32 max_elems)
> +{
> + unsigned int idx = atomic_read(var);
> +
> + if (pvrdma_idx_valid(idx, max_elems)) {
> + return idx & (max_elems - 1);
> + }
> + return PVRDMA_INVALID_IDX;
> +}
> +
> +static inline void pvrdma_idx_ring_inc(int *var, __u32 max_elems)
> +{
> + __u32 idx = atomic_read(var) + 1; /* Increment. */
> +
> + idx &= (max_elems << 1) - 1; /* Modulo size, flip gen. */
> + atomic_set(var, idx);
> +}
> +
> +static inline __s32 pvrdma_idx_ring_has_space(const struct pvrdma_ring *r,
> + __u32 max_elems, __u32 *out_tail)
> +{
> + const __u32 tail = atomic_read(&r->prod_tail);
> + const __u32 head = atomic_read(&r->cons_head);
> +
> + if (pvrdma_idx_valid(tail, max_elems) &&
> + pvrdma_idx_valid(head, max_elems)) {
> + *out_tail = tail & (max_elems - 1);
> + return tail != (head ^ max_elems);
> + }
> + return PVRDMA_INVALID_IDX;
> +}
> +
> +static inline __s32 pvrdma_idx_ring_has_data(const struct pvrdma_ring *r,
> + __u32 max_elems, __u32 *out_head)
> +{
> + const __u32 tail = atomic_read(&r->prod_tail);
> + const __u32 head = atomic_read(&r->cons_head);
> +
> + if (pvrdma_idx_valid(tail, max_elems) &&
> + pvrdma_idx_valid(head, max_elems)) {
> + *out_head = head & (max_elems - 1);
> + return tail != head;
> + }
> + return PVRDMA_INVALID_IDX;
> +}
> +
> +static inline bool pvrdma_idx_ring_is_valid_idx(const struct pvrdma_ring *r,
> + __u32 max_elems, __u32 *idx)
> +{
> + const __u32 tail = atomic_read(&r->prod_tail);
> + const __u32 head = atomic_read(&r->cons_head);
> +
> + if (pvrdma_idx_valid(tail, max_elems) &&
> + pvrdma_idx_valid(head, max_elems) &&
> + pvrdma_idx_valid(*idx, max_elems)) {
> + if (tail > head && (*idx < tail && *idx >= head)) {
> + return true;
> + } else if (head > tail && (*idx >= head || *idx < tail)) {
> + return true;
> + }
> + }
> + return false;
> +}
> +
> +#endif /* PVRDMA_UAPI_H */
> diff --git a/hw/net/pvrdma/pvrdma.h b/hw/net/pvrdma/pvrdma.h
> new file mode 100644
> index 0000000..d6349d4
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma.h
> @@ -0,0 +1,155 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA interface definitions
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + * Yuval Shaia <address@hidden>
> + * Marcel Apfelbaum <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_PVRDMA_H
> +#define PVRDMA_PVRDMA_H
> +
> +#include <qemu/osdep.h>
> +#include <hw/pci/pci.h>
> +#include <hw/pci/msix.h>
> +#include <hw/net/pvrdma/pvrdma_kdbr.h>
> +#include <hw/net/pvrdma/pvrdma_rm.h>
> +#include <hw/net/pvrdma/pvrdma_defs.h>
> +#include <hw/net/pvrdma/pvrdma_dev_api.h>
> +#include <hw/net/pvrdma/pvrdma_ring.h>
> +
> +/* BARs */
> +#define RDMA_MSIX_BAR_IDX 0
> +#define RDMA_REG_BAR_IDX 1
> +#define RDMA_UAR_BAR_IDX 2
> +#define RDMA_BAR0_MSIX_SIZE (16 * 1024)
> +#define RDMA_BAR1_REGS_SIZE 256
> +#define RDMA_BAR2_UAR_SIZE (16 * 1024)
> +
> +/* MSIX */
> +#define RDMA_MAX_INTRS 3
> +#define RDMA_MSIX_TABLE 0x0000
> +#define RDMA_MSIX_PBA 0x2000
> +
> +/* Interrupts Vectors */
> +#define INTR_VEC_CMD_RING 0
> +#define INTR_VEC_CMD_ASYNC_EVENTS 1
> +#define INTR_VEC_CMD_COMPLETION_Q 2
> +
> +/* HW attributes */
> +#define PVRDMA_HW_NAME "pvrdma"
> +#define PVRDMA_HW_VERSION 17
> +#define PVRDMA_FW_VERSION 14
> +
> +/* Vendor Errors, codes 100 to FFF kept for kdbr */
> +#define VENDOR_ERR_TOO_MANY_SGES 0x201
> +#define VENDOR_ERR_NOMEM 0x202
> +#define VENDOR_ERR_FAIL_KDBR 0x203
> +
> +typedef struct HWResourceIDs {
> + unsigned long *local_bitmap;
> + __u32 *hw_map;
> +} HWResourceIDs;
> +
> +typedef struct DSRInfo {
> + dma_addr_t dma;
> + struct pvrdma_device_shared_region *dsr;
> +
> + union pvrdma_cmd_req *req;
> + union pvrdma_cmd_resp *rsp;
> +
> + struct pvrdma_ring *async_ring_state;
> + Ring async;
> +
> + struct pvrdma_ring *cq_ring_state;
> + Ring cq;
> +} DSRInfo;
> +
> +typedef struct PVRDMADev {
> + PCIDevice parent_obj;
> + MemoryRegion msix;
> + MemoryRegion regs;
> + __u32 regs_data[RDMA_BAR1_REGS_SIZE];
> + MemoryRegion uar;
> + __u32 uar_data[RDMA_BAR2_UAR_SIZE];
> + DSRInfo dsr_info;
> + int interrupt_mask;
> + RmPort ports[MAX_PORTS];
> + u64 sys_image_guid;
> + u64 node_guid;
> + u64 network_prefix;
> + RmResTbl pd_tbl;
> + RmResTbl mr_tbl;
> + RmResTbl qp_tbl;
> + RmResTbl cq_tbl;
> + RmResTbl wqe_ctx_tbl;
> +} PVRDMADev;
> +#define PVRDMA_DEV(dev) OBJECT_CHECK(PVRDMADev, (dev), PVRDMA_HW_NAME)
> +
> +static inline int get_reg_val(PVRDMADev *dev, hwaddr addr, __u32 *val)
> +{
> + int idx = addr >> 2;
> +
> + if (idx > RDMA_BAR1_REGS_SIZE) {
> + return -EINVAL;
> + }
> +
> + *val = dev->regs_data[idx];
> +
> + return 0;
> +}
> +static inline int set_reg_val(PVRDMADev *dev, hwaddr addr, __u32 val)
> +{
> + int idx = addr >> 2;
> +
> + if (idx > RDMA_BAR1_REGS_SIZE) {
> + return -EINVAL;
> + }
> +
> + dev->regs_data[idx] = val;
> +
> + return 0;
> +}
> +static inline int get_uar_val(PVRDMADev *dev, hwaddr addr, __u32 *val)
> +{
> + int idx = addr >> 2;
> +
> + if (idx > RDMA_BAR2_UAR_SIZE) {
> + return -EINVAL;
> + }
> +
> + *val = dev->uar_data[idx];
> +
> + return 0;
> +}
> +static inline int set_uar_val(PVRDMADev *dev, hwaddr addr, __u32 val)
> +{
> + int idx = addr >> 2;
> +
> + if (idx > RDMA_BAR2_UAR_SIZE) {
> + return -EINVAL;
> + }
> +
> + dev->uar_data[idx] = val;
> +
> + return 0;
> +}
> +
> +static inline void post_interrupt(PVRDMADev *dev, unsigned vector)
> +{
> + PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> + if (likely(dev->interrupt_mask == 0)) {
> + msix_notify(pci_dev, vector);
> + }
> +}
> +
> +int execute_command(PVRDMADev *dev);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_cmd.c b/hw/net/pvrdma/pvrdma_cmd.c
> new file mode 100644
> index 0000000..ae1ef99
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_cmd.c
> @@ -0,0 +1,322 @@
> +#include "qemu/osdep.h"
> +#include "hw/hw.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pci_ids.h"
> +#include "hw/net/pvrdma/pvrdma_utils.h"
> +#include "hw/net/pvrdma/pvrdma.h"
> +#include "hw/net/pvrdma/pvrdma_rm.h"
> +#include "hw/net/pvrdma/pvrdma_kdbr.h"
> +
> +static int query_port(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_query_port *cmd = &req->query_port;
> + struct pvrdma_cmd_query_port_resp *resp = &rsp->query_port_resp;
> + __u32 max_port_gids, max_port_pkeys;
> +
> + pr_dbg("port=%d\n", cmd->port_num);
> +
> + if (rm_get_max_port_gids(&max_port_gids) != 0) {
> + return -ENOMEM;
> + }
> +
> + if (rm_get_max_port_pkeys(&max_port_pkeys) != 0) {
> + return -ENOMEM;
> + }
> +
> + memset(resp, 0, sizeof(*resp));
> + resp->hdr.response = cmd->hdr.response;
> + resp->hdr.ack = PVRDMA_CMD_QUERY_PORT_RESP;
> + resp->hdr.err = 0;
> +
> + resp->attrs.state = PVRDMA_PORT_ACTIVE;
> + resp->attrs.max_mtu = PVRDMA_MTU_4096;
> + resp->attrs.active_mtu = PVRDMA_MTU_4096;
> + resp->attrs.gid_tbl_len = max_port_gids;
> + resp->attrs.port_cap_flags = 0;
> + resp->attrs.max_msg_sz = 1024;
> + resp->attrs.bad_pkey_cntr = 0;
> + resp->attrs.qkey_viol_cntr = 0;
> + resp->attrs.pkey_tbl_len = max_port_pkeys;
> + resp->attrs.lid = 0;
> + resp->attrs.sm_lid = 0;
> + resp->attrs.lmc = 0;
> + resp->attrs.max_vl_num = 0;
> + resp->attrs.sm_sl = 0;
> + resp->attrs.subnet_timeout = 0;
> + resp->attrs.init_type_reply = 0;
> + resp->attrs.active_width = 1;
> + resp->attrs.active_speed = 1;
> + resp->attrs.phys_state = 1;
> +
> + return 0;
> +}
> +
> +static int query_pkey(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_query_pkey *cmd = &req->query_pkey;
> + struct pvrdma_cmd_query_pkey_resp *resp = &rsp->query_pkey_resp;
> +
> + pr_dbg("port=%d\n", cmd->port_num);
> + pr_dbg("index=%d\n", cmd->index);
> +
> + memset(resp, 0, sizeof(*resp));
> + resp->hdr.response = cmd->hdr.response;
> + resp->hdr.ack = PVRDMA_CMD_QUERY_PKEY_RESP;
> + resp->hdr.err = 0;
> +
> + resp->pkey = 0x7FFF;
> + pr_dbg("pkey=0x%x\n", resp->pkey);
> +
> + return 0;
> +}
> +
> +static int create_pd(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_create_pd *cmd = &req->create_pd;
> + struct pvrdma_cmd_create_pd_resp *resp = &rsp->create_pd_resp;
> +
> + pr_dbg("context=0x%x\n", cmd->ctx_handle ? cmd->ctx_handle : 0);
> +
> + memset(resp, 0, sizeof(*resp));
> + resp->hdr.response = cmd->hdr.response;
> + resp->hdr.ack = PVRDMA_CMD_CREATE_PD_RESP;
> + resp->hdr.err = rm_alloc_pd(dev, &resp->pd_handle, cmd->ctx_handle);
> +
> + pr_dbg("ret=%d\n", resp->hdr.err);
> + return resp->hdr.err;
> +}
> +
> +static int destroy_pd(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_destroy_pd *cmd = &req->destroy_pd;
> +
> + pr_dbg("pd_handle=%d\n", cmd->pd_handle);
> +
> + rm_dealloc_pd(dev, cmd->pd_handle);
> +
> + return 0;
> +}
> +
> +static int create_mr(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_create_mr *cmd = &req->create_mr;
> + struct pvrdma_cmd_create_mr_resp *resp = &rsp->create_mr_resp;
> +
> + pr_dbg("pd_handle=%d\n", cmd->pd_handle);
> + pr_dbg("access_flags=0x%x\n", cmd->access_flags);
> + pr_dbg("flags=0x%x\n", cmd->flags);
> +
> + memset(resp, 0, sizeof(*resp));
> + resp->hdr.response = cmd->hdr.response;
> + resp->hdr.ack = PVRDMA_CMD_CREATE_MR_RESP;
> + resp->hdr.err = rm_alloc_mr(dev, cmd, resp);
> +
> + pr_dbg("ret=%d\n", resp->hdr.err);
> + return resp->hdr.err;
> +}
> +
> +static int destroy_mr(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_destroy_mr *cmd = &req->destroy_mr;
> +
> + pr_dbg("mr_handle=%d\n", cmd->mr_handle);
> +
> + rm_dealloc_mr(dev, cmd->mr_handle);
> +
> + return 0;
> +}
> +
> +static int create_cq(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_create_cq *cmd = &req->create_cq;
> + struct pvrdma_cmd_create_cq_resp *resp = &rsp->create_cq_resp;
> +
> + pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)cmd->pdir_dma);
> + pr_dbg("context=0x%x\n", cmd->ctx_handle ? cmd->ctx_handle : 0);
> + pr_dbg("cqe=%d\n", cmd->cqe);
> + pr_dbg("nchunks=%d\n", cmd->nchunks);
> +
> + memset(resp, 0, sizeof(*resp));
> + resp->hdr.response = cmd->hdr.response;
> + resp->hdr.ack = PVRDMA_CMD_CREATE_CQ_RESP;
> + resp->hdr.err = rm_alloc_cq(dev, cmd, resp);
> +
> + pr_dbg("ret=%d\n", resp->hdr.err);
> + return resp->hdr.err;
> +}
> +
> +static int destroy_cq(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_destroy_cq *cmd = &req->destroy_cq;
> +
> + pr_dbg("cq_handle=%d\n", cmd->cq_handle);
> +
> + rm_dealloc_cq(dev, cmd->cq_handle);
> +
> + return 0;
> +}
> +
> +static int create_qp(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_create_qp *cmd = &req->create_qp;
> + struct pvrdma_cmd_create_qp_resp *resp = &rsp->create_qp_resp;
> +
> + if (!dev->ports[0].kdbr_port) {
> + pr_dbg("First QP, registering port 0\n");
> + dev->ports[0].kdbr_port = kdbr_alloc_port(dev);
> + if (!dev->ports[0].kdbr_port) {
> + pr_dbg("Fail to register port\n");
> + return -EIO;
> + }
> + }
> +
> + pr_dbg("pd_handle=%d\n", cmd->pd_handle);
> + pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)cmd->pdir_dma);
> + pr_dbg("total_chunks=%d\n", cmd->total_chunks);
> + pr_dbg("send_chunks=%d\n", cmd->send_chunks);
> +
> + memset(resp, 0, sizeof(*resp));
> + resp->hdr.response = cmd->hdr.response;
> + resp->hdr.ack = PVRDMA_CMD_CREATE_QP_RESP;
> + resp->hdr.err = rm_alloc_qp(dev, cmd, resp);
> +
> + pr_dbg("ret=%d\n", resp->hdr.err);
> + return resp->hdr.err;
> +}
> +
> +static int modify_qp(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_modify_qp *cmd = &req->modify_qp;
> +
> + pr_dbg("qp_handle=%d\n", cmd->qp_handle);
> +
> + memset(rsp, 0, sizeof(*rsp));
> + rsp->hdr.response = cmd->hdr.response;
> + rsp->hdr.ack = PVRDMA_CMD_MODIFY_QP_RESP;
> + rsp->hdr.err = rm_modify_qp(dev, cmd->qp_handle, cmd);
> +
> + pr_dbg("ret=%d\n", rsp->hdr.err);
> + return rsp->hdr.err;
> +}
> +
> +static int destroy_qp(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + struct pvrdma_cmd_destroy_qp *cmd = &req->destroy_qp;
> +
> + pr_dbg("qp_handle=%d\n", cmd->qp_handle);
> +
> + rm_dealloc_qp(dev, cmd->qp_handle);
> +
> + return 0;
> +}
> +
> +static int create_bind(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + int rc;
> + struct pvrdma_cmd_create_bind *cmd = &req->create_bind;
> + u32 max_port_gids;
> +#ifdef DEBUG
> + __be64 *subnet = (__be64 *)&cmd->new_gid[0];
> + __be64 *if_id = (__be64 *)&cmd->new_gid[8];
> +#endif
> +
> + pr_dbg("index=%d\n", cmd->index);
> +
> + rc = rm_get_max_port_gids(&max_port_gids);
> + if (rc) {
> + return -EIO;
> + }
> +
> + if (cmd->index > max_port_gids) {
> + return -EINVAL;
> + }
> +
> + pr_dbg("gid[%d]=0x%llx,0x%llx\n", cmd->index, *subnet, *if_id);
> +
> + /* Driver forces to one port only */
> + memcpy(dev->ports[0].gid_tbl[cmd->index].raw, &cmd->new_gid,
> + sizeof(cmd->new_gid));
> +
> + return 0;
> +}
> +
> +static int destroy_bind(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp)
> +{
> + /* TODO: Check the usage of this table */
> +
> + struct pvrdma_cmd_destroy_bind *cmd = &req->destroy_bind;
> +
> + pr_dbg("clear index %d\n", cmd->index);
> +
> + memset(dev->ports[0].gid_tbl[cmd->index].raw, 0,
> + sizeof(dev->ports[0].gid_tbl[cmd->index].raw));
> +
> + return 0;
> +}
> +
> +struct cmd_handler {
> + __u32 cmd;
> + int (*exec)(PVRDMADev *dev, union pvrdma_cmd_req *req,
> + union pvrdma_cmd_resp *rsp);
> +};
> +
> +static struct cmd_handler cmd_handlers[] = {
> + {PVRDMA_CMD_QUERY_PORT, query_port},
> + {PVRDMA_CMD_QUERY_PKEY, query_pkey},
> + {PVRDMA_CMD_CREATE_PD, create_pd},
> + {PVRDMA_CMD_DESTROY_PD, destroy_pd},
> + {PVRDMA_CMD_CREATE_MR, create_mr},
> + {PVRDMA_CMD_DESTROY_MR, destroy_mr},
> + {PVRDMA_CMD_CREATE_CQ, create_cq},
> + {PVRDMA_CMD_RESIZE_CQ, NULL},
> + {PVRDMA_CMD_DESTROY_CQ, destroy_cq},
> + {PVRDMA_CMD_CREATE_QP, create_qp},
> + {PVRDMA_CMD_MODIFY_QP, modify_qp},
> + {PVRDMA_CMD_QUERY_QP, NULL},
> + {PVRDMA_CMD_DESTROY_QP, destroy_qp},
> + {PVRDMA_CMD_CREATE_UC, NULL},
> + {PVRDMA_CMD_DESTROY_UC, NULL},
> + {PVRDMA_CMD_CREATE_BIND, create_bind},
> + {PVRDMA_CMD_DESTROY_BIND, destroy_bind},
> +};
> +
> +int execute_command(PVRDMADev *dev)
> +{
> + int err = 0xFFFF;
> + DSRInfo *dsr_info;
> +
> + dsr_info = &dev->dsr_info;
> +
> + pr_dbg("cmd=%d\n", dsr_info->req->hdr.cmd);
> + if (dsr_info->req->hdr.cmd >= sizeof(cmd_handlers) /
> + sizeof(struct cmd_handler)) {
> + pr_err("Unsupported command\n");
> + goto out;
> + }
> +
> + if (!cmd_handlers[dsr_info->req->hdr.cmd].exec) {
> + pr_err("Unsupported command (not implemented yet)\n");
> + goto out;
> + }
> +
> + err = cmd_handlers[dsr_info->req->hdr.cmd].exec(dev, dsr_info->req,
> + dsr_info->rsp);
> +out:
> + set_reg_val(dev, PVRDMA_REG_ERR, err);
> + post_interrupt(dev, INTR_VEC_CMD_RING);
> +
> + return (err == 0) ? 0 : -EINVAL;
> +}
> diff --git a/hw/net/pvrdma/pvrdma_defs.h b/hw/net/pvrdma/pvrdma_defs.h
> new file mode 100644
> index 0000000..1d0cc11
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_defs.h
> @@ -0,0 +1,301 @@
> +/*
> + * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of EITHER the GNU General Public License
> + * version 2 as published by the Free Software Foundation or the BSD
> + * 2-Clause License. This program is distributed in the hope that it
> + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
> + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU General Public License version 2 for more details at
> + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program available in the file COPYING in the main
> + * directory of this source tree.
> + *
> + * The BSD 2-Clause License
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef PVRDMA_DEFS_H
> +#define PVRDMA_DEFS_H
> +
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +#include <hw/net/pvrdma/pvrdma_ib_verbs.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +
> +/*
> + * Masks and accessors for page directory, which is a two-level lookup:
> + * page directory -> page table -> page. Only one directory for now, but we
> + * could expand that easily. 9 bits for tables, 9 bits for pages, gives one
> + * gigabyte for memory regions and so forth.
> + */
> +
> +#define PVRDMA_PDIR_SHIFT 18
> +#define PVRDMA_PTABLE_SHIFT 9
> +#define PVRDMA_PAGE_DIR_DIR(x) (((x) >> PVRDMA_PDIR_SHIFT) & 0x1)
> +#define PVRDMA_PAGE_DIR_TABLE(x) (((x) >> PVRDMA_PTABLE_SHIFT) & 0x1ff)
> +#define PVRDMA_PAGE_DIR_PAGE(x) ((x) & 0x1ff)
> +#define PVRDMA_PAGE_DIR_MAX_PAGES (1 * 512 * 512)
> +#define PVRDMA_MAX_FAST_REG_PAGES 128
> +
> +/*
> + * Max MSI-X vectors.
> + */
> +
> +#define PVRDMA_MAX_INTERRUPTS 3
> +
> +/* Register offsets within PCI resource on BAR1. */
> +#define PVRDMA_REG_VERSION 0x00 /* R: Version of device. */
> +#define PVRDMA_REG_DSRLOW 0x04 /* W: Device shared region low PA. */
> +#define PVRDMA_REG_DSRHIGH 0x08 /* W: Device shared region high PA. */
> +#define PVRDMA_REG_CTL 0x0c /* W: PVRDMA_DEVICE_CTL */
> +#define PVRDMA_REG_REQUEST 0x10 /* W: Indicate device request. */
> +#define PVRDMA_REG_ERR 0x14 /* R: Device error. */
> +#define PVRDMA_REG_ICR 0x18 /* R: Interrupt cause. */
> +#define PVRDMA_REG_IMR 0x1c /* R/W: Interrupt mask. */
> +#define PVRDMA_REG_MACL 0x20 /* R/W: MAC address low. */
> +#define PVRDMA_REG_MACH 0x24 /* R/W: MAC address high. */
> +
> +/* Object flags. */
> +#define PVRDMA_CQ_FLAG_ARMED_SOL BIT(0) /* Armed for solicited-only. */
> +#define PVRDMA_CQ_FLAG_ARMED BIT(1) /* Armed. */
> +#define PVRDMA_MR_FLAG_DMA BIT(0) /* DMA region. */
> +#define PVRDMA_MR_FLAG_FRMR BIT(1) /* Fast reg memory region. */
> +
> +/*
> + * Atomic operation capability (masked versions are extended atomic
> + * operations.
> + */
> +
> +#define PVRDMA_ATOMIC_OP_COMP_SWAP BIT(0) /* Compare and swap. */
> +#define PVRDMA_ATOMIC_OP_FETCH_ADD BIT(1) /* Fetch and add. */
> +#define PVRDMA_ATOMIC_OP_MASK_COMP_SWAP BIT(2) /* Masked compare and
> swap. */
> +#define PVRDMA_ATOMIC_OP_MASK_FETCH_ADD BIT(3) /* Masked fetch and add. */
> +
> +/*
> + * Base Memory Management Extension flags to support Fast Reg Memory Regions
> + * and Fast Reg Work Requests. Each flag represents a verb operation and we
> + * must support all of them to qualify for the BMME device cap.
> + */
> +
> +#define PVRDMA_BMME_FLAG_LOCAL_INV BIT(0) /* Local Invalidate. */
> +#define PVRDMA_BMME_FLAG_REMOTE_INV BIT(1) /* Remote Invalidate. */
> +#define PVRDMA_BMME_FLAG_FAST_REG_WR BIT(2) /* Fast Reg Work Request. */
> +
> +/*
> + * GID types. The interpretation of the gid_types bit field in the device
> + * capabilities will depend on the device mode. For now, the device only
> + * supports RoCE as mode, so only the different GID types for RoCE are
> + * defined.
> + */
> +
> +#define PVRDMA_GID_TYPE_FLAG_ROCE_V1 BIT(0)
> +#define PVRDMA_GID_TYPE_FLAG_ROCE_V2 BIT(1)
> +
> +enum pvrdma_pci_resource {
> + PVRDMA_PCI_RESOURCE_MSIX, /* BAR0: MSI-X, MMIO. */
> + PVRDMA_PCI_RESOURCE_REG, /* BAR1: Registers, MMIO. */
> + PVRDMA_PCI_RESOURCE_UAR, /* BAR2: UAR pages, MMIO, 64-bit. */
> + PVRDMA_PCI_RESOURCE_LAST, /* Last. */
> +};
> +
> +enum pvrdma_device_ctl {
> + PVRDMA_DEVICE_CTL_ACTIVATE, /* Activate device. */
> + PVRDMA_DEVICE_CTL_QUIESCE, /* Quiesce device. */
> + PVRDMA_DEVICE_CTL_RESET, /* Reset device. */
> +};
> +
> +enum pvrdma_intr_vector {
> + PVRDMA_INTR_VECTOR_RESPONSE, /* Command response. */
> + PVRDMA_INTR_VECTOR_ASYNC, /* Async events. */
> + PVRDMA_INTR_VECTOR_CQ, /* CQ notification. */
> + /* Additional CQ notification vectors. */
> +};
> +
> +enum pvrdma_intr_cause {
> + PVRDMA_INTR_CAUSE_RESPONSE = (1 << PVRDMA_INTR_VECTOR_RESPONSE),
> + PVRDMA_INTR_CAUSE_ASYNC = (1 << PVRDMA_INTR_VECTOR_ASYNC),
> + PVRDMA_INTR_CAUSE_CQ = (1 << PVRDMA_INTR_VECTOR_CQ),
> +};
> +
> +enum pvrdma_intr_type {
> + PVRDMA_INTR_TYPE_INTX, /* Legacy. */
> + PVRDMA_INTR_TYPE_MSI, /* MSI. */
> + PVRDMA_INTR_TYPE_MSIX, /* MSI-X. */
> +};
> +
> +enum pvrdma_gos_bits {
> + PVRDMA_GOS_BITS_UNK, /* Unknown. */
> + PVRDMA_GOS_BITS_32, /* 32-bit. */
> + PVRDMA_GOS_BITS_64, /* 64-bit. */
> +};
> +
> +enum pvrdma_gos_type {
> + PVRDMA_GOS_TYPE_UNK, /* Unknown. */
> + PVRDMA_GOS_TYPE_LINUX, /* Linux. */
> +};
> +
> +enum pvrdma_device_mode {
> + PVRDMA_DEVICE_MODE_ROCE, /* RoCE. */
> + PVRDMA_DEVICE_MODE_IWARP, /* iWarp. */
> + PVRDMA_DEVICE_MODE_IB, /* InfiniBand. */
> +};
> +
> +struct pvrdma_gos_info {
> + u32 gos_bits:2; /* W: PVRDMA_GOS_BITS_ */
> + u32 gos_type:4; /* W: PVRDMA_GOS_TYPE_ */
> + u32 gos_ver:16; /* W: Guest OS version. */
> + u32 gos_misc:10; /* W: Other. */
> + u32 pad; /* Pad to 8-byte alignment. */
> +};
> +
> +struct pvrdma_device_caps {
> + u64 fw_ver; /* R: Query device. */
> + __be64 node_guid;
> + __be64 sys_image_guid;
> + u64 max_mr_size;
> + u64 page_size_cap;
> + u64 atomic_arg_sizes; /* EXP verbs. */
> + u32 exp_comp_mask; /* EXP verbs. */
> + u32 device_cap_flags2; /* EXP verbs. */
> + u32 max_fa_bit_boundary; /* EXP verbs. */
> + u32 log_max_atomic_inline_arg; /* EXP verbs. */
> + u32 vendor_id;
> + u32 vendor_part_id;
> + u32 hw_ver;
> + u32 max_qp;
> + u32 max_qp_wr;
> + u32 device_cap_flags;
> + u32 max_sge;
> + u32 max_sge_rd;
> + u32 max_cq;
> + u32 max_cqe;
> + u32 max_mr;
> + u32 max_pd;
> + u32 max_qp_rd_atom;
> + u32 max_ee_rd_atom;
> + u32 max_res_rd_atom;
> + u32 max_qp_init_rd_atom;
> + u32 max_ee_init_rd_atom;
> + u32 max_ee;
> + u32 max_rdd;
> + u32 max_mw;
> + u32 max_raw_ipv6_qp;
> + u32 max_raw_ethy_qp;
> + u32 max_mcast_grp;
> + u32 max_mcast_qp_attach;
> + u32 max_total_mcast_qp_attach;
> + u32 max_ah;
> + u32 max_fmr;
> + u32 max_map_per_fmr;
> + u32 max_srq;
> + u32 max_srq_wr;
> + u32 max_srq_sge;
> + u32 max_uar;
> + u32 gid_tbl_len;
> + u16 max_pkeys;
> + u8 local_ca_ack_delay;
> + u8 phys_port_cnt;
> + u8 mode; /* PVRDMA_DEVICE_MODE_ */
> + u8 atomic_ops; /* PVRDMA_ATOMIC_OP_* bits */
> + u8 bmme_flags; /* FRWR Mem Mgmt Extensions */
> + u8 gid_types; /* PVRDMA_GID_TYPE_FLAG_ */
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_ring_page_info {
> + u32 num_pages; /* Num pages incl. header. */
> + u32 reserved; /* Reserved. */
> + u64 pdir_dma; /* Page directory PA. */
> +};
> +
> +#pragma pack(push, 1)
> +
> +struct pvrdma_device_shared_region {
> + u32 driver_version; /* W: Driver version. */
> + u32 pad; /* Pad to 8-byte align. */
> + struct pvrdma_gos_info gos_info; /* W: Guest OS information. */
> + u64 cmd_slot_dma; /* W: Command slot address. */
> + u64 resp_slot_dma; /* W: Response slot address. */
> + struct pvrdma_ring_page_info async_ring_pages;
> + /* W: Async ring page info. */
> + struct pvrdma_ring_page_info cq_ring_pages;
> + /* W: CQ ring page info. */
> + u32 uar_pfn; /* W: UAR pageframe. */
> + u32 pad2; /* Pad to 8-byte align. */
> + struct pvrdma_device_caps caps; /* R: Device capabilities. */
> +};
> +
> +#pragma pack(pop)
> +
> +
> +/* Event types. Currently a 1:1 mapping with enum ib_event. */
> +enum pvrdma_eqe_type {
> + PVRDMA_EVENT_CQ_ERR,
> + PVRDMA_EVENT_QP_FATAL,
> + PVRDMA_EVENT_QP_REQ_ERR,
> + PVRDMA_EVENT_QP_ACCESS_ERR,
> + PVRDMA_EVENT_COMM_EST,
> + PVRDMA_EVENT_SQ_DRAINED,
> + PVRDMA_EVENT_PATH_MIG,
> + PVRDMA_EVENT_PATH_MIG_ERR,
> + PVRDMA_EVENT_DEVICE_FATAL,
> + PVRDMA_EVENT_PORT_ACTIVE,
> + PVRDMA_EVENT_PORT_ERR,
> + PVRDMA_EVENT_LID_CHANGE,
> + PVRDMA_EVENT_PKEY_CHANGE,
> + PVRDMA_EVENT_SM_CHANGE,
> + PVRDMA_EVENT_SRQ_ERR,
> + PVRDMA_EVENT_SRQ_LIMIT_REACHED,
> + PVRDMA_EVENT_QP_LAST_WQE_REACHED,
> + PVRDMA_EVENT_CLIENT_REREGISTER,
> + PVRDMA_EVENT_GID_CHANGE,
> +};
> +
> +/* Event queue element. */
> +struct pvrdma_eqe {
> + u32 type; /* Event type. */
> + u32 info; /* Handle, other. */
> +};
> +
> +/* CQ notification queue element. */
> +struct pvrdma_cqne {
> + u32 info; /* Handle */
> +};
> +
> +static inline void pvrdma_init_cqe(struct pvrdma_cqe *cqe, u64 wr_id, u64 qp)
> +{
> + memset(cqe, 0, sizeof(*cqe));
> + cqe->status = PVRDMA_WC_GENERAL_ERR;
> + cqe->wr_id = wr_id;
> + cqe->qp = qp;
> +}
> +
> +#endif /* PVRDMA_DEFS_H */
> diff --git a/hw/net/pvrdma/pvrdma_dev_api.h b/hw/net/pvrdma/pvrdma_dev_api.h
> new file mode 100644
> index 0000000..4887b96
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_dev_api.h
> @@ -0,0 +1,342 @@
> +/*
> + * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of EITHER the GNU General Public License
> + * version 2 as published by the Free Software Foundation or the BSD
> + * 2-Clause License. This program is distributed in the hope that it
> + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
> + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU General Public License version 2 for more details at
> + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program available in the file COPYING in the main
> + * directory of this source tree.
> + *
> + * The BSD 2-Clause License
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef PVRDMA_DEV_API_H
> +#define PVRDMA_DEV_API_H
> +
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +#include <hw/net/pvrdma/pvrdma_ib_verbs.h>
> +
> +enum {
> + PVRDMA_CMD_FIRST,
> + PVRDMA_CMD_QUERY_PORT = PVRDMA_CMD_FIRST,
> + PVRDMA_CMD_QUERY_PKEY,
> + PVRDMA_CMD_CREATE_PD,
> + PVRDMA_CMD_DESTROY_PD,
> + PVRDMA_CMD_CREATE_MR,
> + PVRDMA_CMD_DESTROY_MR,
> + PVRDMA_CMD_CREATE_CQ,
> + PVRDMA_CMD_RESIZE_CQ,
> + PVRDMA_CMD_DESTROY_CQ,
> + PVRDMA_CMD_CREATE_QP,
> + PVRDMA_CMD_MODIFY_QP,
> + PVRDMA_CMD_QUERY_QP,
> + PVRDMA_CMD_DESTROY_QP,
> + PVRDMA_CMD_CREATE_UC,
> + PVRDMA_CMD_DESTROY_UC,
> + PVRDMA_CMD_CREATE_BIND,
> + PVRDMA_CMD_DESTROY_BIND,
> + PVRDMA_CMD_MAX,
> +};
> +
> +enum {
> + PVRDMA_CMD_FIRST_RESP = (1 << 31),
> + PVRDMA_CMD_QUERY_PORT_RESP = PVRDMA_CMD_FIRST_RESP,
> + PVRDMA_CMD_QUERY_PKEY_RESP,
> + PVRDMA_CMD_CREATE_PD_RESP,
> + PVRDMA_CMD_DESTROY_PD_RESP_NOOP,
> + PVRDMA_CMD_CREATE_MR_RESP,
> + PVRDMA_CMD_DESTROY_MR_RESP_NOOP,
> + PVRDMA_CMD_CREATE_CQ_RESP,
> + PVRDMA_CMD_RESIZE_CQ_RESP,
> + PVRDMA_CMD_DESTROY_CQ_RESP_NOOP,
> + PVRDMA_CMD_CREATE_QP_RESP,
> + PVRDMA_CMD_MODIFY_QP_RESP,
> + PVRDMA_CMD_QUERY_QP_RESP,
> + PVRDMA_CMD_DESTROY_QP_RESP,
> + PVRDMA_CMD_CREATE_UC_RESP,
> + PVRDMA_CMD_DESTROY_UC_RESP_NOOP,
> + PVRDMA_CMD_CREATE_BIND_RESP_NOOP,
> + PVRDMA_CMD_DESTROY_BIND_RESP_NOOP,
> + PVRDMA_CMD_MAX_RESP,
> +};
> +
> +struct pvrdma_cmd_hdr {
> + u64 response; /* Key for response lookup. */
> + u32 cmd; /* PVRDMA_CMD_ */
> + u32 reserved; /* Reserved. */
> +};
> +
> +struct pvrdma_cmd_resp_hdr {
> + u64 response; /* From cmd hdr. */
> + u32 ack; /* PVRDMA_CMD_XXX_RESP */
> + u8 err; /* Error. */
> + u8 reserved[3]; /* Reserved. */
> +};
> +
> +struct pvrdma_cmd_query_port {
> + struct pvrdma_cmd_hdr hdr;
> + u8 port_num;
> + u8 reserved[7];
> +};
> +
> +struct pvrdma_cmd_query_port_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + struct pvrdma_port_attr attrs;
> +};
> +
> +struct pvrdma_cmd_query_pkey {
> + struct pvrdma_cmd_hdr hdr;
> + u8 port_num;
> + u8 index;
> + u8 reserved[6];
> +};
> +
> +struct pvrdma_cmd_query_pkey_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + u16 pkey;
> + u8 reserved[6];
> +};
> +
> +struct pvrdma_cmd_create_uc {
> + struct pvrdma_cmd_hdr hdr;
> + u32 pfn; /* UAR page frame number */
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_uc_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + u32 ctx_handle;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_uc {
> + struct pvrdma_cmd_hdr hdr;
> + u32 ctx_handle;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_pd {
> + struct pvrdma_cmd_hdr hdr;
> + u32 ctx_handle;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_pd_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + u32 pd_handle;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_pd {
> + struct pvrdma_cmd_hdr hdr;
> + u32 pd_handle;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_mr {
> + struct pvrdma_cmd_hdr hdr;
> + u64 start;
> + u64 length;
> + u64 pdir_dma;
> + u32 pd_handle;
> + u32 access_flags;
> + u32 flags;
> + u32 nchunks;
> +};
> +
> +struct pvrdma_cmd_create_mr_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + u32 mr_handle;
> + u32 lkey;
> + u32 rkey;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_mr {
> + struct pvrdma_cmd_hdr hdr;
> + u32 mr_handle;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_cq {
> + struct pvrdma_cmd_hdr hdr;
> + u64 pdir_dma;
> + u32 ctx_handle;
> + u32 cqe;
> + u32 nchunks;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_cq_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + u32 cq_handle;
> + u32 cqe;
> +};
> +
> +struct pvrdma_cmd_resize_cq {
> + struct pvrdma_cmd_hdr hdr;
> + u32 cq_handle;
> + u32 cqe;
> +};
> +
> +struct pvrdma_cmd_resize_cq_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + u32 cqe;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_cq {
> + struct pvrdma_cmd_hdr hdr;
> + u32 cq_handle;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_qp {
> + struct pvrdma_cmd_hdr hdr;
> + u64 pdir_dma;
> + u32 pd_handle;
> + u32 send_cq_handle;
> + u32 recv_cq_handle;
> + u32 srq_handle;
> + u32 max_send_wr;
> + u32 max_recv_wr;
> + u32 max_send_sge;
> + u32 max_recv_sge;
> + u32 max_inline_data;
> + u32 lkey;
> + u32 access_flags;
> + u16 total_chunks;
> + u16 send_chunks;
> + u16 max_atomic_arg;
> + u8 sq_sig_all;
> + u8 qp_type;
> + u8 is_srq;
> + u8 reserved[3];
> +};
> +
> +struct pvrdma_cmd_create_qp_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + u32 qpn;
> + u32 max_send_wr;
> + u32 max_recv_wr;
> + u32 max_send_sge;
> + u32 max_recv_sge;
> + u32 max_inline_data;
> +};
> +
> +struct pvrdma_cmd_modify_qp {
> + struct pvrdma_cmd_hdr hdr;
> + u32 qp_handle;
> + u32 attr_mask;
> + struct pvrdma_qp_attr attrs;
> +};
> +
> +struct pvrdma_cmd_query_qp {
> + struct pvrdma_cmd_hdr hdr;
> + u32 qp_handle;
> + u32 attr_mask;
> +};
> +
> +struct pvrdma_cmd_query_qp_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + struct pvrdma_qp_attr attrs;
> +};
> +
> +struct pvrdma_cmd_destroy_qp {
> + struct pvrdma_cmd_hdr hdr;
> + u32 qp_handle;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_qp_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + u32 events_reported;
> + u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_bind {
> + struct pvrdma_cmd_hdr hdr;
> + u32 mtu;
> + u32 vlan;
> + u32 index;
> + u8 new_gid[16];
> + u8 gid_type;
> + u8 reserved[3];
> +};
> +
> +struct pvrdma_cmd_destroy_bind {
> + struct pvrdma_cmd_hdr hdr;
> + u32 index;
> + u8 dest_gid[16];
> + u8 reserved[4];
> +};
> +
> +union pvrdma_cmd_req {
> + struct pvrdma_cmd_hdr hdr;
> + struct pvrdma_cmd_query_port query_port;
> + struct pvrdma_cmd_query_pkey query_pkey;
> + struct pvrdma_cmd_create_uc create_uc;
> + struct pvrdma_cmd_destroy_uc destroy_uc;
> + struct pvrdma_cmd_create_pd create_pd;
> + struct pvrdma_cmd_destroy_pd destroy_pd;
> + struct pvrdma_cmd_create_mr create_mr;
> + struct pvrdma_cmd_destroy_mr destroy_mr;
> + struct pvrdma_cmd_create_cq create_cq;
> + struct pvrdma_cmd_resize_cq resize_cq;
> + struct pvrdma_cmd_destroy_cq destroy_cq;
> + struct pvrdma_cmd_create_qp create_qp;
> + struct pvrdma_cmd_modify_qp modify_qp;
> + struct pvrdma_cmd_query_qp query_qp;
> + struct pvrdma_cmd_destroy_qp destroy_qp;
> + struct pvrdma_cmd_create_bind create_bind;
> + struct pvrdma_cmd_destroy_bind destroy_bind;
> +};
> +
> +union pvrdma_cmd_resp {
> + struct pvrdma_cmd_resp_hdr hdr;
> + struct pvrdma_cmd_query_port_resp query_port_resp;
> + struct pvrdma_cmd_query_pkey_resp query_pkey_resp;
> + struct pvrdma_cmd_create_uc_resp create_uc_resp;
> + struct pvrdma_cmd_create_pd_resp create_pd_resp;
> + struct pvrdma_cmd_create_mr_resp create_mr_resp;
> + struct pvrdma_cmd_create_cq_resp create_cq_resp;
> + struct pvrdma_cmd_resize_cq_resp resize_cq_resp;
> + struct pvrdma_cmd_create_qp_resp create_qp_resp;
> + struct pvrdma_cmd_query_qp_resp query_qp_resp;
> + struct pvrdma_cmd_destroy_qp_resp destroy_qp_resp;
> +};
> +
> +#endif /* PVRDMA_DEV_API_H */
> diff --git a/hw/net/pvrdma/pvrdma_ib_verbs.h b/hw/net/pvrdma/pvrdma_ib_verbs.h
> new file mode 100644
> index 0000000..e2a23f3
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_ib_verbs.h
> @@ -0,0 +1,469 @@
> +/*
> + * [PLEASE NOTE: VMWARE, INC. ELECTS TO USE AND DISTRIBUTE THIS COMPONENT
> + * UNDER THE TERMS OF THE OpenIB.org BSD license. THE ORIGINAL LICENSE TERMS
> + * ARE REPRODUCED BELOW ONLY AS A REFERENCE.]
> + *
> + * Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved.
> + * Copyright (c) 2004 Infinicon Corporation. All rights reserved.
> + * Copyright (c) 2004 Intel Corporation. All rights reserved.
> + * Copyright (c) 2004 Topspin Corporation. All rights reserved.
> + * Copyright (c) 2004 Voltaire Corporation. All rights reserved.
> + * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
> + * Copyright (c) 2005, 2006, 2007 Cisco Systems. All rights reserved.
> + * Copyright (c) 2015-2016 VMware, Inc. All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses. You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef PVRDMA_IB_VERBS_H
> +#define PVRDMA_IB_VERBS_H
> +
> +#include <linux/types.h>
> +
> +union pvrdma_gid {
> + u8 raw[16];
> + struct {
> + __be64 subnet_prefix;
> + __be64 interface_id;
> + } global;
> +};
> +
> +enum pvrdma_link_layer {
> + PVRDMA_LINK_LAYER_UNSPECIFIED,
> + PVRDMA_LINK_LAYER_INFINIBAND,
> + PVRDMA_LINK_LAYER_ETHERNET,
> +};
> +
> +enum pvrdma_mtu {
> + PVRDMA_MTU_256 = 1,
> + PVRDMA_MTU_512 = 2,
> + PVRDMA_MTU_1024 = 3,
> + PVRDMA_MTU_2048 = 4,
> + PVRDMA_MTU_4096 = 5,
> +};
> +
> +static inline int pvrdma_mtu_enum_to_int(enum pvrdma_mtu mtu)
> +{
> + switch (mtu) {
> + case PVRDMA_MTU_256: return 256;
> + case PVRDMA_MTU_512: return 512;
> + case PVRDMA_MTU_1024: return 1024;
> + case PVRDMA_MTU_2048: return 2048;
> + case PVRDMA_MTU_4096: return 4096;
> + default: return -1;
> + }
> +}
> +
> +static inline enum pvrdma_mtu pvrdma_mtu_int_to_enum(int mtu)
> +{
> + switch (mtu) {
> + case 256: return PVRDMA_MTU_256;
> + case 512: return PVRDMA_MTU_512;
> + case 1024: return PVRDMA_MTU_1024;
> + case 2048: return PVRDMA_MTU_2048;
> + case 4096:
> + default: return PVRDMA_MTU_4096;
> + }
> +}
> +
> +enum pvrdma_port_state {
> + PVRDMA_PORT_NOP = 0,
> + PVRDMA_PORT_DOWN = 1,
> + PVRDMA_PORT_INIT = 2,
> + PVRDMA_PORT_ARMED = 3,
> + PVRDMA_PORT_ACTIVE = 4,
> + PVRDMA_PORT_ACTIVE_DEFER = 5,
> +};
> +
> +enum pvrdma_port_cap_flags {
> + PVRDMA_PORT_SM = 1 << 1,
> + PVRDMA_PORT_NOTICE_SUP = 1 << 2,
> + PVRDMA_PORT_TRAP_SUP = 1 << 3,
> + PVRDMA_PORT_OPT_IPD_SUP = 1 << 4,
> + PVRDMA_PORT_AUTO_MIGR_SUP = 1 << 5,
> + PVRDMA_PORT_SL_MAP_SUP = 1 << 6,
> + PVRDMA_PORT_MKEY_NVRAM = 1 << 7,
> + PVRDMA_PORT_PKEY_NVRAM = 1 << 8,
> + PVRDMA_PORT_LED_INFO_SUP = 1 << 9,
> + PVRDMA_PORT_SM_DISABLED = 1 << 10,
> + PVRDMA_PORT_SYS_IMAGE_GUID_SUP = 1 << 11,
> + PVRDMA_PORT_PKEY_SW_EXT_PORT_TRAP_SUP = 1 << 12,
> + PVRDMA_PORT_EXTENDED_SPEEDS_SUP = 1 << 14,
> + PVRDMA_PORT_CM_SUP = 1 << 16,
> + PVRDMA_PORT_SNMP_TUNNEL_SUP = 1 << 17,
> + PVRDMA_PORT_REINIT_SUP = 1 << 18,
> + PVRDMA_PORT_DEVICE_MGMT_SUP = 1 << 19,
> + PVRDMA_PORT_VENDOR_CLASS_SUP = 1 << 20,
> + PVRDMA_PORT_DR_NOTICE_SUP = 1 << 21,
> + PVRDMA_PORT_CAP_MASK_NOTICE_SUP = 1 << 22,
> + PVRDMA_PORT_BOOT_MGMT_SUP = 1 << 23,
> + PVRDMA_PORT_LINK_LATENCY_SUP = 1 << 24,
> + PVRDMA_PORT_CLIENT_REG_SUP = 1 << 25,
> + PVRDMA_PORT_IP_BASED_GIDS = 1 << 26,
> + PVRDMA_PORT_CAP_FLAGS_MAX = PVRDMA_PORT_IP_BASED_GIDS,
> +};
> +
> +enum pvrdma_port_width {
> + PVRDMA_WIDTH_1X = 1,
> + PVRDMA_WIDTH_4X = 2,
> + PVRDMA_WIDTH_8X = 4,
> + PVRDMA_WIDTH_12X = 8,
> +};
> +
> +static inline int pvrdma_width_enum_to_int(enum pvrdma_port_width width)
> +{
> + switch (width) {
> + case PVRDMA_WIDTH_1X: return 1;
> + case PVRDMA_WIDTH_4X: return 4;
> + case PVRDMA_WIDTH_8X: return 8;
> + case PVRDMA_WIDTH_12X: return 12;
> + default: return -1;
> + }
> +}
> +
> +enum pvrdma_port_speed {
> + PVRDMA_SPEED_SDR = 1,
> + PVRDMA_SPEED_DDR = 2,
> + PVRDMA_SPEED_QDR = 4,
> + PVRDMA_SPEED_FDR10 = 8,
> + PVRDMA_SPEED_FDR = 16,
> + PVRDMA_SPEED_EDR = 32,
> +};
> +
> +struct pvrdma_port_attr {
> + enum pvrdma_port_state state;
> + enum pvrdma_mtu max_mtu;
> + enum pvrdma_mtu active_mtu;
> + u32 gid_tbl_len;
> + u32 port_cap_flags;
> + u32 max_msg_sz;
> + u32 bad_pkey_cntr;
> + u32 qkey_viol_cntr;
> + u16 pkey_tbl_len;
> + u16 lid;
> + u16 sm_lid;
> + u8 lmc;
> + u8 max_vl_num;
> + u8 sm_sl;
> + u8 subnet_timeout;
> + u8 init_type_reply;
> + u8 active_width;
> + u8 active_speed;
> + u8 phys_state;
> + u8 reserved[2];
> +};
> +
> +struct pvrdma_global_route {
> + union pvrdma_gid dgid;
> + u32 flow_label;
> + u8 sgid_index;
> + u8 hop_limit;
> + u8 traffic_class;
> + u8 reserved;
> +};
> +
> +struct pvrdma_grh {
> + __be32 version_tclass_flow;
> + __be16 paylen;
> + u8 next_hdr;
> + u8 hop_limit;
> + union pvrdma_gid sgid;
> + union pvrdma_gid dgid;
> +};
> +
> +enum pvrdma_ah_flags {
> + PVRDMA_AH_GRH = 1,
> +};
> +
> +enum pvrdma_rate {
> + PVRDMA_RATE_PORT_CURRENT = 0,
> + PVRDMA_RATE_2_5_GBPS = 2,
> + PVRDMA_RATE_5_GBPS = 5,
> + PVRDMA_RATE_10_GBPS = 3,
> + PVRDMA_RATE_20_GBPS = 6,
> + PVRDMA_RATE_30_GBPS = 4,
> + PVRDMA_RATE_40_GBPS = 7,
> + PVRDMA_RATE_60_GBPS = 8,
> + PVRDMA_RATE_80_GBPS = 9,
> + PVRDMA_RATE_120_GBPS = 10,
> + PVRDMA_RATE_14_GBPS = 11,
> + PVRDMA_RATE_56_GBPS = 12,
> + PVRDMA_RATE_112_GBPS = 13,
> + PVRDMA_RATE_168_GBPS = 14,
> + PVRDMA_RATE_25_GBPS = 15,
> + PVRDMA_RATE_100_GBPS = 16,
> + PVRDMA_RATE_200_GBPS = 17,
> + PVRDMA_RATE_300_GBPS = 18,
> +};
> +
> +struct pvrdma_ah_attr {
> + struct pvrdma_global_route grh;
> + u16 dlid;
> + u16 vlan_id;
> + u8 sl;
> + u8 src_path_bits;
> + u8 static_rate;
> + u8 ah_flags;
> + u8 port_num;
> + u8 dmac[6];
> + u8 reserved;
> +};
> +
> +enum pvrdma_wc_status {
> + PVRDMA_WC_SUCCESS,
> + PVRDMA_WC_LOC_LEN_ERR,
> + PVRDMA_WC_LOC_QP_OP_ERR,
> + PVRDMA_WC_LOC_EEC_OP_ERR,
> + PVRDMA_WC_LOC_PROT_ERR,
> + PVRDMA_WC_WR_FLUSH_ERR,
> + PVRDMA_WC_MW_BIND_ERR,
> + PVRDMA_WC_BAD_RESP_ERR,
> + PVRDMA_WC_LOC_ACCESS_ERR,
> + PVRDMA_WC_REM_INV_REQ_ERR,
> + PVRDMA_WC_REM_ACCESS_ERR,
> + PVRDMA_WC_REM_OP_ERR,
> + PVRDMA_WC_RETRY_EXC_ERR,
> + PVRDMA_WC_RNR_RETRY_EXC_ERR,
> + PVRDMA_WC_LOC_RDD_VIOL_ERR,
> + PVRDMA_WC_REM_INV_RD_REQ_ERR,
> + PVRDMA_WC_REM_ABORT_ERR,
> + PVRDMA_WC_INV_EECN_ERR,
> + PVRDMA_WC_INV_EEC_STATE_ERR,
> + PVRDMA_WC_FATAL_ERR,
> + PVRDMA_WC_RESP_TIMEOUT_ERR,
> + PVRDMA_WC_GENERAL_ERR,
> +};
> +
> +enum pvrdma_wc_opcode {
> + PVRDMA_WC_SEND,
> + PVRDMA_WC_RDMA_WRITE,
> + PVRDMA_WC_RDMA_READ,
> + PVRDMA_WC_COMP_SWAP,
> + PVRDMA_WC_FETCH_ADD,
> + PVRDMA_WC_BIND_MW,
> + PVRDMA_WC_LSO,
> + PVRDMA_WC_LOCAL_INV,
> + PVRDMA_WC_FAST_REG_MR,
> + PVRDMA_WC_MASKED_COMP_SWAP,
> + PVRDMA_WC_MASKED_FETCH_ADD,
> + PVRDMA_WC_RECV = 1 << 7,
> + PVRDMA_WC_RECV_RDMA_WITH_IMM,
> +};
> +
> +enum pvrdma_wc_flags {
> + PVRDMA_WC_GRH = 1 << 0,
> + PVRDMA_WC_WITH_IMM = 1 << 1,
> + PVRDMA_WC_WITH_INVALIDATE = 1 << 2,
> + PVRDMA_WC_IP_CSUM_OK = 1 << 3,
> + PVRDMA_WC_WITH_SMAC = 1 << 4,
> + PVRDMA_WC_WITH_VLAN = 1 << 5,
> + PVRDMA_WC_FLAGS_MAX = PVRDMA_WC_WITH_VLAN,
> +};
> +
> +enum pvrdma_cq_notify_flags {
> + PVRDMA_CQ_SOLICITED = 1 << 0,
> + PVRDMA_CQ_NEXT_COMP = 1 << 1,
> + PVRDMA_CQ_SOLICITED_MASK = PVRDMA_CQ_SOLICITED |
> + PVRDMA_CQ_NEXT_COMP,
> + PVRDMA_CQ_REPORT_MISSED_EVENTS = 1 << 2,
> +};
> +
> +struct pvrdma_qp_cap {
> + u32 max_send_wr;
> + u32 max_recv_wr;
> + u32 max_send_sge;
> + u32 max_recv_sge;
> + u32 max_inline_data;
> + u32 reserved;
> +};
> +
> +enum pvrdma_sig_type {
> + PVRDMA_SIGNAL_ALL_WR,
> + PVRDMA_SIGNAL_REQ_WR,
> +};
> +
> +enum pvrdma_qp_type {
> + PVRDMA_QPT_SMI,
> + PVRDMA_QPT_GSI,
> + PVRDMA_QPT_RC,
> + PVRDMA_QPT_UC,
> + PVRDMA_QPT_UD,
> + PVRDMA_QPT_RAW_IPV6,
> + PVRDMA_QPT_RAW_ETHERTYPE,
> + PVRDMA_QPT_RAW_PACKET = 8,
> + PVRDMA_QPT_XRC_INI = 9,
> + PVRDMA_QPT_XRC_TGT,
> + PVRDMA_QPT_MAX,
> +};
> +
> +enum pvrdma_qp_create_flags {
> + PVRDMA_QP_CREATE_IPOPVRDMA_UD_LSO = 1 << 0,
> + PVRDMA_QP_CREATE_BLOCK_MULTICAST_LOOPBACK = 1 << 1,
> +};
> +
> +enum pvrdma_qp_attr_mask {
> + PVRDMA_QP_STATE = 1 << 0,
> + PVRDMA_QP_CUR_STATE = 1 << 1,
> + PVRDMA_QP_EN_SQD_ASYNC_NOTIFY = 1 << 2,
> + PVRDMA_QP_ACCESS_FLAGS = 1 << 3,
> + PVRDMA_QP_PKEY_INDEX = 1 << 4,
> + PVRDMA_QP_PORT = 1 << 5,
> + PVRDMA_QP_QKEY = 1 << 6,
> + PVRDMA_QP_AV = 1 << 7,
> + PVRDMA_QP_PATH_MTU = 1 << 8,
> + PVRDMA_QP_TIMEOUT = 1 << 9,
> + PVRDMA_QP_RETRY_CNT = 1 << 10,
> + PVRDMA_QP_RNR_RETRY = 1 << 11,
> + PVRDMA_QP_RQ_PSN = 1 << 12,
> + PVRDMA_QP_MAX_QP_RD_ATOMIC = 1 << 13,
> + PVRDMA_QP_ALT_PATH = 1 << 14,
> + PVRDMA_QP_MIN_RNR_TIMER = 1 << 15,
> + PVRDMA_QP_SQ_PSN = 1 << 16,
> + PVRDMA_QP_MAX_DEST_RD_ATOMIC = 1 << 17,
> + PVRDMA_QP_PATH_MIG_STATE = 1 << 18,
> + PVRDMA_QP_CAP = 1 << 19,
> + PVRDMA_QP_DEST_QPN = 1 << 20,
> + PVRDMA_QP_ATTR_MASK_MAX = PVRDMA_QP_DEST_QPN,
> +};
> +
> +enum pvrdma_qp_state {
> + PVRDMA_QPS_RESET,
> + PVRDMA_QPS_INIT,
> + PVRDMA_QPS_RTR,
> + PVRDMA_QPS_RTS,
> + PVRDMA_QPS_SQD,
> + PVRDMA_QPS_SQE,
> + PVRDMA_QPS_ERR,
> +};
> +
> +enum pvrdma_mig_state {
> + PVRDMA_MIG_MIGRATED,
> + PVRDMA_MIG_REARM,
> + PVRDMA_MIG_ARMED,
> +};
> +
> +enum pvrdma_mw_type {
> + PVRDMA_MW_TYPE_1 = 1,
> + PVRDMA_MW_TYPE_2 = 2,
> +};
> +
> +struct pvrdma_qp_attr {
> + enum pvrdma_qp_state qp_state;
> + enum pvrdma_qp_state cur_qp_state;
> + enum pvrdma_mtu path_mtu;
> + enum pvrdma_mig_state path_mig_state;
> + u32 qkey;
> + u32 rq_psn;
> + u32 sq_psn;
> + u32 dest_qp_num;
> + u32 qp_access_flags;
> + u16 pkey_index;
> + u16 alt_pkey_index;
> + u8 en_sqd_async_notify;
> + u8 sq_draining;
> + u8 max_rd_atomic;
> + u8 max_dest_rd_atomic;
> + u8 min_rnr_timer;
> + u8 port_num;
> + u8 timeout;
> + u8 retry_cnt;
> + u8 rnr_retry;
> + u8 alt_port_num;
> + u8 alt_timeout;
> + u8 reserved[5];
> + struct pvrdma_qp_cap cap;
> + struct pvrdma_ah_attr ah_attr;
> + struct pvrdma_ah_attr alt_ah_attr;
> +};
> +
> +enum pvrdma_wr_opcode {
> + PVRDMA_WR_RDMA_WRITE,
> + PVRDMA_WR_RDMA_WRITE_WITH_IMM,
> + PVRDMA_WR_SEND,
> + PVRDMA_WR_SEND_WITH_IMM,
> + PVRDMA_WR_RDMA_READ,
> + PVRDMA_WR_ATOMIC_CMP_AND_SWP,
> + PVRDMA_WR_ATOMIC_FETCH_AND_ADD,
> + PVRDMA_WR_LSO,
> + PVRDMA_WR_SEND_WITH_INV,
> + PVRDMA_WR_RDMA_READ_WITH_INV,
> + PVRDMA_WR_LOCAL_INV,
> + PVRDMA_WR_FAST_REG_MR,
> + PVRDMA_WR_MASKED_ATOMIC_CMP_AND_SWP,
> + PVRDMA_WR_MASKED_ATOMIC_FETCH_AND_ADD,
> + PVRDMA_WR_BIND_MW,
> + PVRDMA_WR_REG_SIG_MR,
> +};
> +
> +enum pvrdma_send_flags {
> + PVRDMA_SEND_FENCE = 1 << 0,
> + PVRDMA_SEND_SIGNALED = 1 << 1,
> + PVRDMA_SEND_SOLICITED = 1 << 2,
> + PVRDMA_SEND_INLINE = 1 << 3,
> + PVRDMA_SEND_IP_CSUM = 1 << 4,
> + PVRDMA_SEND_FLAGS_MAX = PVRDMA_SEND_IP_CSUM,
> +};
> +
> +enum pvrdma_access_flags {
> + PVRDMA_ACCESS_LOCAL_WRITE = 1 << 0,
> + PVRDMA_ACCESS_REMOTE_WRITE = 1 << 1,
> + PVRDMA_ACCESS_REMOTE_READ = 1 << 2,
> + PVRDMA_ACCESS_REMOTE_ATOMIC = 1 << 3,
> + PVRDMA_ACCESS_MW_BIND = 1 << 4,
> + PVRDMA_ZERO_BASED = 1 << 5,
> + PVRDMA_ACCESS_ON_DEMAND = 1 << 6,
> + PVRDMA_ACCESS_FLAGS_MAX = PVRDMA_ACCESS_ON_DEMAND,
> +};
> +
> +enum ib_wc_status {
> + IB_WC_SUCCESS,
> + IB_WC_LOC_LEN_ERR,
> + IB_WC_LOC_QP_OP_ERR,
> + IB_WC_LOC_EEC_OP_ERR,
> + IB_WC_LOC_PROT_ERR,
> + IB_WC_WR_FLUSH_ERR,
> + IB_WC_MW_BIND_ERR,
> + IB_WC_BAD_RESP_ERR,
> + IB_WC_LOC_ACCESS_ERR,
> + IB_WC_REM_INV_REQ_ERR,
> + IB_WC_REM_ACCESS_ERR,
> + IB_WC_REM_OP_ERR,
> + IB_WC_RETRY_EXC_ERR,
> + IB_WC_RNR_RETRY_EXC_ERR,
> + IB_WC_LOC_RDD_VIOL_ERR,
> + IB_WC_REM_INV_RD_REQ_ERR,
> + IB_WC_REM_ABORT_ERR,
> + IB_WC_INV_EECN_ERR,
> + IB_WC_INV_EEC_STATE_ERR,
> + IB_WC_FATAL_ERR,
> + IB_WC_RESP_TIMEOUT_ERR,
> + IB_WC_GENERAL_ERR
> +};
> +
> +#endif /* PVRDMA_IB_VERBS_H */
> diff --git a/hw/net/pvrdma/pvrdma_kdbr.c b/hw/net/pvrdma/pvrdma_kdbr.c
> new file mode 100644
> index 0000000..ec04afd
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_kdbr.c
> @@ -0,0 +1,395 @@
> +#include <qemu/osdep.h>
> +#include <hw/pci/pci.h>
> +
> +#include <sys/ioctl.h>
> +
> +#include <hw/net/pvrdma/pvrdma.h>
> +#include <hw/net/pvrdma/pvrdma_ib_verbs.h>
> +#include <hw/net/pvrdma/pvrdma_rm.h>
> +#include <hw/net/pvrdma/pvrdma_kdbr.h>
> +#include <hw/net/pvrdma/pvrdma_utils.h>
> +#include <hw/net/pvrdma/kdbr.h>
> +
> +int kdbr_fd = -1;
> +
> +#define MAX_CONSEQ_CQES_READ 10
> +
> +typedef struct KdbrCtx {
> + struct kdbr_req req;
> + void *up_ctx;
> + bool is_tx_req;
> +} KdbrCtx;
> +
> +static void (*tx_comp_handler)(int status, unsigned int vendor_err,
> + void *ctx) = 0;
> +static void (*rx_comp_handler)(int status, unsigned int vendor_err,
> + void *ctx) = 0;
> +
> +static void kdbr_err_to_pvrdma_err(int kdbr_status, unsigned int *status,
> + unsigned int *vendor_err)
> +{
> + if (kdbr_status == 0) {
> + *status = IB_WC_SUCCESS;
> + *vendor_err = 0;
> + return;
> + }
> +
> + *vendor_err = kdbr_status;
> + switch (kdbr_status) {
> + case KDBR_ERR_CODE_EMPTY_VEC:
> + *status = IB_WC_LOC_LEN_ERR;
> + break;
> + case KDBR_ERR_CODE_NO_MORE_RECV_BUF:
> + *status = IB_WC_REM_OP_ERR;
> + break;
> + case KDBR_ERR_CODE_RECV_BUF_PROT:
> + *status = IB_WC_REM_ACCESS_ERR;
> + break;
> + case KDBR_ERR_CODE_INV_ADDR:
> + *status = IB_WC_LOC_ACCESS_ERR;
> + break;
> + case KDBR_ERR_CODE_INV_CONN_ID:
> + *status = IB_WC_LOC_PROT_ERR;
> + break;
> + case KDBR_ERR_CODE_NO_PEER:
> + *status = IB_WC_LOC_QP_OP_ERR;
> + break;
> + default:
> + *status = IB_WC_GENERAL_ERR;
> + break;
> + }
> +}
> +
> +static void *comp_handler_thread(void *arg)
> +{
> + KdbrPort *port = (KdbrPort *)arg;
> + struct kdbr_completion comp[MAX_CONSEQ_CQES_READ];
> + int i, j, rc;
> + KdbrCtx *sctx;
> + unsigned int status, vendor_err;
> +
> + while (port->comp_thread.run) {
> + rc = read(port->fd, &comp, sizeof(comp));
> + if (unlikely(rc % sizeof(struct kdbr_completion))) {
> + pr_err("Got unsupported message size (%d) from kdbr\n", rc);
> + continue;
> + }
> + pr_dbg("Processing %ld CQEs from kdbr\n",
> + rc / sizeof(struct kdbr_completion));
> +
> + for (i = 0; i < rc / sizeof(struct kdbr_completion); i++) {
> + pr_dbg("comp.req_id=%ld\n", comp[i].req_id);
> + pr_dbg("comp.status=%d\n", comp[i].status);
> +
> + sctx = rm_get_wqe_ctx(PVRDMA_DEV(port->dev), comp[i].req_id);
> + if (!sctx) {
> + pr_err("Fail to find ctx for req %ld\n", comp[i].req_id);
> + continue;
> + }
> + pr_dbg("Processing %s CQE\n", sctx->is_tx_req ? "send" : "recv");
> +
> + for (j = 0; j < sctx->req.vlen; j++) {
> + pr_dbg("payload=%s\n", (char *)sctx->req.vec[j].iov_base);
> + pvrdma_pci_dma_unmap(port->dev, sctx->req.vec[j].iov_base,
> + sctx->req.vec[j].iov_len);
> + }
> +
> + kdbr_err_to_pvrdma_err(comp[i].status, &status, &vendor_err);
> + pr_dbg("status=%d\n", status);
> + pr_dbg("vendor_err=0x%x\n", vendor_err);
> +
> + if (sctx->is_tx_req) {
> + tx_comp_handler(status, vendor_err, sctx->up_ctx);
> + } else {
> + rx_comp_handler(status, vendor_err, sctx->up_ctx);
> + }
> +
> + rm_dealloc_wqe_ctx(PVRDMA_DEV(port->dev), comp[i].req_id);
> + free(sctx);
> + }
> + }
> +
> + pr_dbg("Going down\n");
> +
> + return NULL;
> +}
> +
> +KdbrPort *kdbr_alloc_port(PVRDMADev *dev)
> +{
> + int rc;
> + KdbrPort *port;
> + char name[80] = {0};
> + struct kdbr_reg reg;
> +
> + port = malloc(sizeof(KdbrPort));
> + if (!port) {
> + pr_dbg("Fail to allocate memory for port object\n");
> + return NULL;
> + }
> +
> + port->dev = PCI_DEVICE(dev);
> +
> + pr_dbg("net=0x%llx\n", dev->ports[0].gid_tbl[0].global.subnet_prefix);
> + pr_dbg("guid=0x%llx\n", dev->ports[0].gid_tbl[0].global.interface_id);
> + reg.gid.net_id = dev->ports[0].gid_tbl[0].global.subnet_prefix;
> + reg.gid.id = dev->ports[0].gid_tbl[0].global.interface_id;
> + rc = ioctl(kdbr_fd, KDBR_REGISTER_PORT, ®);
> + if (rc < 0) {
> + pr_err("Fail to allocate port\n");
> + goto err_free_port;
> + }
> +
> + port->num = reg.port;
> +
> + sprintf(name, KDBR_FILE_NAME "%d", port->num);
> + port->fd = open(name, O_RDWR);
> + if (port->fd < 0) {
> + pr_err("Fail to open file %s\n", name);
> + goto err_unregister_device;
> + }
> +
> + sprintf(name, "pvrdma_comp_%d", port->num);
> + port->comp_thread.run = true;
> + qemu_thread_create(&port->comp_thread.thread, name, comp_handler_thread,
> + port, QEMU_THREAD_DETACHED);
> +
> + pr_info("Port %d (fd %d) allocated\n", port->num, port->fd);
> +
> + return port;
> +
> +err_unregister_device:
> + ioctl(kdbr_fd, KDBR_UNREGISTER_PORT, &port->num);
> +
> +err_free_port:
> + free(port);
> +
> + return NULL;
> +}
> +
> +void kdbr_free_port(KdbrPort *port)
> +{
> + int rc;
> +
> + if (!port) {
> + return;
> + }
> +
> + rc = write(port->fd, (char *)0, 1);
> + port->comp_thread.run = false;
> + close(port->fd);
> +
> + rc = ioctl(kdbr_fd, KDBR_UNREGISTER_PORT, &port->num);
> + if (rc < 0) {
> + pr_err("Fail to allocate port\n");
> + }
> +
> + free(port);
> +}
> +
> +unsigned long kdbr_open_connection(KdbrPort *port, u32 qpn,
> + union pvrdma_gid dgid, u32 dqpn, bool
> rc_qp)
> +{
> + int rc;
> + struct kdbr_connection connection = {0};
> +
> + connection.queue_id = qpn;
> + connection.peer.rgid.net_id = dgid.global.subnet_prefix;
> + connection.peer.rgid.id = dgid.global.interface_id;
> + connection.peer.rqueue = dqpn;
> + connection.ack_type = rc_qp ? KDBR_ACK_DELAYED : KDBR_ACK_IMMEDIATE;
> +
> + rc = ioctl(port->fd, KDBR_PORT_OPEN_CONN, &connection);
> + if (rc <= 0) {
> + pr_err("Fail to open kdbr connection on port %d fd %d err %d\n",
> + port->num, port->fd, rc);
> + return 0;
> + }
> +
> + return (unsigned long)rc;
> +}
> +
> +void kdbr_close_connection(KdbrPort *port, unsigned long connection_id)
> +{
> + int rc;
> +
> + rc = ioctl(port->fd, KDBR_PORT_CLOSE_CONN, &connection_id);
> + if (rc < 0) {
> + pr_err("Fail to close kdbr connection on port %d\n",
> + port->num);
> + }
> +}
> +
> +void kdbr_register_tx_comp_handler(void (*comp_handler)(int status,
> + unsigned int vendor_err, void *ctx))
> +{
> + tx_comp_handler = comp_handler;
> +}
> +
> +void kdbr_register_rx_comp_handler(void (*comp_handler)(int status,
> + unsigned int vendor_err, void *ctx))
> +{
> + rx_comp_handler = comp_handler;
> +}
> +
> +void kdbr_send_wqe(KdbrPort *port, unsigned long connection_id, bool rc_qp,
> + struct RmSqWqe *wqe, void *ctx)
> +{
> + KdbrCtx *sctx;
> + int rc;
> + int i;
> +
> + pr_dbg("kdbr_port=%d\n", port->num);
> + pr_dbg("kdbr_connection_id=%ld\n", connection_id);
> + pr_dbg("wqe->hdr.num_sge=%d\n", wqe->hdr.num_sge);
> +
> + /* Last minute validation - verify that kdbr supports num_sge */
> + /* TODO: Make sure this will not happen! */
> + if (wqe->hdr.num_sge > KDBR_MAX_IOVEC_LEN) {
> + pr_err("Error: requested %d SGEs where kdbr supports %d\n",
> + wqe->hdr.num_sge, KDBR_MAX_IOVEC_LEN);
> + tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_TOO_MANY_SGES, ctx);
> + return;
> + }
> +
> + sctx = malloc(sizeof(*sctx));
> + if (!sctx) {
> + pr_err("Fail to allocate kdbr request ctx\n");
> + tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
> + }
> +
> + memset(&sctx->req, 0, sizeof(sctx->req));
> + sctx->req.flags = KDBR_REQ_SIGNATURE | KDBR_REQ_POST_SEND;
> + sctx->req.connection_id = connection_id;
> +
> + sctx->up_ctx = ctx;
> + sctx->is_tx_req = 1;
> +
> + rc = rm_alloc_wqe_ctx(PVRDMA_DEV(port->dev), &sctx->req.req_id, sctx);
> + if (rc != 0) {
> + pr_err("Fail to allocate request ID\n");
> + free(sctx);
> + tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
> + return;
> + }
> + sctx->req.vlen = wqe->hdr.num_sge;
> +
> + for (i = 0; i < wqe->hdr.num_sge; i++) {
> + struct pvrdma_sge *sge;
> +
> + sge = &wqe->sge[i];
> +
> + pr_dbg("addr=0x%llx\n", sge->addr);
> + pr_dbg("length=%d\n", sge->length);
> + pr_dbg("lkey=0x%x\n", sge->lkey);
> +
> + sctx->req.vec[i].iov_base = pvrdma_pci_dma_map(port->dev, sge->addr,
> + sge->length);
> + sctx->req.vec[i].iov_len = sge->length;
> + }
> +
> + if (!rc_qp) {
> + sctx->req.peer.rqueue = wqe->hdr.wr.ud.remote_qpn;
> + sctx->req.peer.rgid.net_id = *((unsigned long *)
> + &wqe->hdr.wr.ud.av.dgid[0]);
> + sctx->req.peer.rgid.id = *((unsigned long *)
> + &wqe->hdr.wr.ud.av.dgid[8]);
> + }
> +
> + rc = write(port->fd, &sctx->req, sizeof(sctx->req));
> + if (rc < 0) {
> + pr_err("Fail (%d, %d) to post send WQE to port %d, conn_id %ld\n",
> rc,
> + errno, port->num, connection_id);
> + tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_FAIL_KDBR, ctx);
> + return;
> + }
> +}
> +
> +void kdbr_recv_wqe(KdbrPort *port, unsigned long connection_id,
> + struct RmRqWqe *wqe, void *ctx)
> +{
> + KdbrCtx *sctx;
> + int rc;
> + int i;
> +
> + pr_dbg("kdbr_port=%d\n", port->num);
> + pr_dbg("kdbr_connection_id=%ld\n", connection_id);
> + pr_dbg("wqe->hdr.num_sge=%d\n", wqe->hdr.num_sge);
> +
> + /* Last minute validation - verify that kdbr supports num_sge */
> + if (wqe->hdr.num_sge > KDBR_MAX_IOVEC_LEN) {
> + pr_err("Error: requested %d SGEs where kdbr supports %d\n",
> + wqe->hdr.num_sge, KDBR_MAX_IOVEC_LEN);
> + tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_TOO_MANY_SGES, ctx);
> + return;
> + }
> +
> + sctx = malloc(sizeof(*sctx));
> + if (!sctx) {
> + pr_err("Fail to allocate kdbr request ctx\n");
> + tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
> + }
> +
> + memset(&sctx->req, 0, sizeof(sctx->req));
> + sctx->req.flags = KDBR_REQ_SIGNATURE | KDBR_REQ_POST_RECV;
> + sctx->req.connection_id = connection_id;
> +
> + sctx->up_ctx = ctx;
> + sctx->is_tx_req = 0;
> +
> + pr_dbg("sctx=%p\n", sctx);
> + rc = rm_alloc_wqe_ctx(PVRDMA_DEV(port->dev), &sctx->req.req_id, sctx);
> + if (rc != 0) {
> + pr_err("Fail to allocate request ID\n");
> + free(sctx);
> + tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
> + return;
> + }
> +
> + sctx->req.vlen = wqe->hdr.num_sge;
> +
> + for (i = 0; i < wqe->hdr.num_sge; i++) {
> + struct pvrdma_sge *sge;
> +
> + sge = &wqe->sge[i];
> +
> + pr_dbg("addr=0x%llx\n", sge->addr);
> + pr_dbg("length=%d\n", sge->length);
> + pr_dbg("lkey=0x%x\n", sge->lkey);
> +
> + sctx->req.vec[i].iov_base = pvrdma_pci_dma_map(port->dev, sge->addr,
> + sge->length);
> + sctx->req.vec[i].iov_len = sge->length;
> + }
> +
> + rc = write(port->fd, &sctx->req, sizeof(sctx->req));
> + if (rc < 0) {
> + pr_err("Fail (%d, %d) to post recv WQE to port %d, conn_id %ld\n",
> rc,
> + errno, port->num, connection_id);
> + tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_FAIL_KDBR, ctx);
> + return;
> + }
> +}
> +
> +static void dummy_comp_handler(int status, unsigned int vendor_err, void
> *ctx)
> +{
> + pr_err("No completion handler is registered\n");
> +}
> +
> +int kdbr_init(void)
> +{
> + kdbr_register_tx_comp_handler(dummy_comp_handler);
> + kdbr_register_rx_comp_handler(dummy_comp_handler);
> +
> + kdbr_fd = open(KDBR_FILE_NAME, 0);
> + if (kdbr_fd < 0) {
> + pr_dbg("Can't connect to kdbr, rc=%d\n", kdbr_fd);
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +void kdbr_fini(void)
> +{
> + close(kdbr_fd);
> +}
> diff --git a/hw/net/pvrdma/pvrdma_kdbr.h b/hw/net/pvrdma/pvrdma_kdbr.h
> new file mode 100644
> index 0000000..293a180
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_kdbr.h
> @@ -0,0 +1,53 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA QP Operations
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + * Yuval Shaia <address@hidden>
> + * Marcel Apfelbaum <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_KDBR_H
> +#define PVRDMA_KDBR_H
> +
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +#include <hw/net/pvrdma/pvrdma_ib_verbs.h>
> +#include <hw/net/pvrdma/pvrdma_rm.h>
> +#include <hw/net/pvrdma/kdbr.h>
> +
> +typedef struct KdbrCompThread {
> + QemuThread thread;
> + QemuMutex mutex;
> + bool run;
> +} KdbrCompThread;
> +
> +typedef struct KdbrPort {
> + int num;
> + int fd;
> + KdbrCompThread comp_thread;
> + PCIDevice *dev;
> +} KdbrPort;
> +
> +int kdbr_init(void);
> +void kdbr_fini(void);
> +KdbrPort *kdbr_alloc_port(PVRDMADev *dev);
> +void kdbr_free_port(KdbrPort *port);
> +void kdbr_register_tx_comp_handler(void (*comp_handler)(int status,
> + unsigned int vendor_err, void *ctx));
> +void kdbr_register_rx_comp_handler(void (*comp_handler)(int status,
> + unsigned int vendor_err, void *ctx));
> +unsigned long kdbr_open_connection(KdbrPort *port, u32 qpn,
> + union pvrdma_gid dgid, u32 dqpn,
> + bool rc_qp);
> +void kdbr_close_connection(KdbrPort *port, unsigned long connection_id);
> +void kdbr_send_wqe(KdbrPort *port, unsigned long connection_id, bool rc_qp,
> + struct RmSqWqe *wqe, void *ctx);
> +void kdbr_recv_wqe(KdbrPort *port, unsigned long connection_id,
> + struct RmRqWqe *wqe, void *ctx);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_main.c b/hw/net/pvrdma/pvrdma_main.c
> new file mode 100644
> index 0000000..5db802e
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_main.c
> @@ -0,0 +1,667 @@
> +#include <qemu/osdep.h>
> +#include <hw/hw.h>
> +#include <hw/pci/pci.h>
> +#include <hw/pci/pci_ids.h>
> +#include <hw/pci/msi.h>
> +#include <hw/pci/msix.h>
> +#include <hw/qdev-core.h>
> +#include <hw/qdev-properties.h>
> +#include <cpu.h>
> +
> +#include "hw/net/pvrdma/pvrdma.h"
> +#include "hw/net/pvrdma/pvrdma_defs.h"
> +#include "hw/net/pvrdma/pvrdma_utils.h"
> +#include "hw/net/pvrdma/pvrdma_dev_api.h"
> +#include "hw/net/pvrdma/pvrdma_rm.h"
> +#include "hw/net/pvrdma/pvrdma_kdbr.h"
> +#include "hw/net/pvrdma/pvrdma_qp_ops.h"
> +
> +static Property pvrdma_dev_properties[] = {
> + DEFINE_PROP_UINT64("sys-image-guid", PVRDMADev, sys_image_guid, 0),
> + DEFINE_PROP_UINT64("node-guid", PVRDMADev, node_guid, 0),
> + DEFINE_PROP_UINT64("network-prefix", PVRDMADev, network_prefix, 0),
> + DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void free_dev_ring(PCIDevice *pci_dev, Ring *ring, void *ring_state)
> +{
> + ring_free(ring);
> + pvrdma_pci_dma_unmap(pci_dev, ring_state, TARGET_PAGE_SIZE);
> +}
> +
> +static int init_dev_ring(Ring *ring, struct pvrdma_ring **ring_state,
> + const char *name, PCIDevice *pci_dev,
> + dma_addr_t dir_addr, u32 num_pages)
> +{
> + __u64 *dir, *tbl;
> + int rc = 0;
> +
> + pr_dbg("Initializing device ring %s\n", name);
> + pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)dir_addr);
> + pr_dbg("num_pages=%d\n", num_pages);
> + dir = pvrdma_pci_dma_map(pci_dev, dir_addr, TARGET_PAGE_SIZE);
> + if (!dir) {
> + pr_err("Fail to map to page directory\n");
> + rc = -ENOMEM;
> + goto out;
> + }
> + tbl = pvrdma_pci_dma_map(pci_dev, dir[0], TARGET_PAGE_SIZE);
> + if (!tbl) {
> + pr_err("Fail to map to page table\n");
> + rc = -ENOMEM;
> + goto out_free_dir;
> + }
> +
> + *ring_state = pvrdma_pci_dma_map(pci_dev, tbl[0], TARGET_PAGE_SIZE);
> + if (!*ring_state) {
> + pr_err("Fail to map to ring state\n");
> + rc = -ENOMEM;
> + goto out_free_tbl;
> + }
> + /* RX ring is the second */
> + (struct pvrdma_ring *)(*ring_state)++;
> + rc = ring_init(ring, name, pci_dev, (struct pvrdma_ring *)*ring_state,
> + (num_pages - 1) * TARGET_PAGE_SIZE /
> + sizeof(struct pvrdma_cqne), sizeof(struct pvrdma_cqne),
> + (dma_addr_t *)&tbl[1], (dma_addr_t)num_pages - 1);
> + if (rc != 0) {
> + pr_err("Fail to initialize ring\n");
> + rc = -ENOMEM;
> + goto out_free_ring_state;
> + }
> +
> + goto out_free_tbl;
> +
> +out_free_ring_state:
> + pvrdma_pci_dma_unmap(pci_dev, *ring_state, TARGET_PAGE_SIZE);
> +
> +out_free_tbl:
> + pvrdma_pci_dma_unmap(pci_dev, tbl, TARGET_PAGE_SIZE);
> +
> +out_free_dir:
> + pvrdma_pci_dma_unmap(pci_dev, dir, TARGET_PAGE_SIZE);
> +
> +out:
> + return rc;
> +}
> +
> +static void free_dsr(PVRDMADev *dev)
> +{
> + PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> + if (!dev->dsr_info.dsr) {
> + return;
> + }
> +
> + free_dev_ring(pci_dev, &dev->dsr_info.async,
> + dev->dsr_info.async_ring_state);
> +
> + free_dev_ring(pci_dev, &dev->dsr_info.cq, dev->dsr_info.cq_ring_state);
> +
> + pvrdma_pci_dma_unmap(pci_dev, dev->dsr_info.req,
> + sizeof(union pvrdma_cmd_req));
> +
> + pvrdma_pci_dma_unmap(pci_dev, dev->dsr_info.rsp,
> + sizeof(union pvrdma_cmd_resp));
> +
> + pvrdma_pci_dma_unmap(pci_dev, dev->dsr_info.dsr,
> + sizeof(struct pvrdma_device_shared_region));
> +
> + dev->dsr_info.dsr = NULL;
> +}
> +
> +static int load_dsr(PVRDMADev *dev)
> +{
> + int rc = 0;
> + PCIDevice *pci_dev = PCI_DEVICE(dev);
> + DSRInfo *dsr_info;
> + struct pvrdma_device_shared_region *dsr;
> +
> + free_dsr(dev);
> +
> + /* Map to DSR */
> + pr_dbg("dsr_dma=0x%llx\n", (long long unsigned int)dev->dsr_info.dma);
> + dev->dsr_info.dsr = pvrdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> + sizeof(struct pvrdma_device_shared_region));
> + if (!dev->dsr_info.dsr) {
> + pr_err("Fail to map to DSR\n");
> + rc = -ENOMEM;
> + goto out;
> + }
> +
> + /* Shortcuts */
> + dsr_info = &dev->dsr_info;
> + dsr = dsr_info->dsr;
> +
> + /* Map to command slot */
> + pr_dbg("cmd_dma=0x%llx\n", (long long unsigned int)dsr->cmd_slot_dma);
> + dsr_info->req = pvrdma_pci_dma_map(pci_dev, dsr->cmd_slot_dma,
> + sizeof(union pvrdma_cmd_req));
> + if (!dsr_info->req) {
> + pr_err("Fail to map to command slot address\n");
> + rc = -ENOMEM;
> + goto out_free_dsr;
> + }
> +
> + /* Map to response slot */
> + pr_dbg("rsp_dma=0x%llx\n", (long long unsigned int)dsr->resp_slot_dma);
> + dsr_info->rsp = pvrdma_pci_dma_map(pci_dev, dsr->resp_slot_dma,
> + sizeof(union pvrdma_cmd_resp));
> + if (!dsr_info->rsp) {
> + pr_err("Fail to map to response slot address\n");
> + rc = -ENOMEM;
> + goto out_free_req;
> + }
> +
> + /* Map to CQ notification ring */
> + rc = init_dev_ring(&dsr_info->cq, &dsr_info->cq_ring_state, "dev_cq",
> + pci_dev, dsr->cq_ring_pages.pdir_dma,
> + dsr->cq_ring_pages.num_pages);
> + if (rc != 0) {
> + pr_err("Fail to map to initialize CQ ring\n");
> + rc = -ENOMEM;
> + goto out_free_rsp;
> + }
> +
> + /* Map to event notification ring */
> + rc = init_dev_ring(&dsr_info->async, &dsr_info->async_ring_state,
> + "dev_async", pci_dev, dsr->async_ring_pages.pdir_dma,
> + dsr->async_ring_pages.num_pages);
> + if (rc != 0) {
> + pr_err("Fail to map to initialize event ring\n");
> + rc = -ENOMEM;
> + goto out_free_rsp;
> + }
> +
> + goto out;
> +
> +out_free_rsp:
> + pvrdma_pci_dma_unmap(pci_dev, dsr_info->rsp, sizeof(union
> pvrdma_cmd_resp));
> +
> +out_free_req:
> + pvrdma_pci_dma_unmap(pci_dev, dsr_info->req, sizeof(union
> pvrdma_cmd_req));
> +
> +out_free_dsr:
> + pvrdma_pci_dma_unmap(pci_dev, dsr_info->dsr,
> + sizeof(struct pvrdma_device_shared_region));
> + dsr_info->dsr = NULL;
> +
> +out:
> + return rc;
> +}
> +
> +static void init_dev_caps(PVRDMADev *dev)
> +{
> + struct pvrdma_device_shared_region *dsr;
> +
> + if (dev->dsr_info.dsr == NULL) {
> + pr_err("Can't initialized DSR\n");
> + return;
> + }
> +
> + dsr = dev->dsr_info.dsr;
> +
> + dsr->caps.fw_ver = PVRDMA_FW_VERSION;
> + pr_dbg("fw_ver=0x%lx\n", dsr->caps.fw_ver);
> +
> + dsr->caps.mode = PVRDMA_DEVICE_MODE_ROCE;
> + pr_dbg("mode=%d\n", dsr->caps.mode);
> +
> + dsr->caps.gid_types |= PVRDMA_GID_TYPE_FLAG_ROCE_V1;
> + pr_dbg("gid_types=0x%x\n", dsr->caps.gid_types);
> +
> + dsr->caps.max_uar = RDMA_BAR2_UAR_SIZE;
> + pr_dbg("max_uar=%d\n", dsr->caps.max_uar);
> +
> + if (rm_get_max_pds(&dsr->caps.max_pd)) {
> + return;
> + }
> + pr_dbg("max_pd=%d\n", dsr->caps.max_pd);
> +
> + if (rm_get_max_gids(&dsr->caps.gid_tbl_len)) {
> + return;
> + }
> + pr_dbg("gid_tbl_len=%d\n", dsr->caps.gid_tbl_len);
> +
> + if (rm_get_max_cqs(&dsr->caps.max_cq)) {
> + return;
> + }
> + pr_dbg("max_cq=%d\n", dsr->caps.max_cq);
> +
> + if (rm_get_max_cqes(&dsr->caps.max_cqe)) {
> + return;
> + }
> + pr_dbg("max_cqe=%d\n", dsr->caps.max_cqe);
> +
> + if (rm_get_max_qps(&dsr->caps.max_qp)) {
> + return;
> + }
> + pr_dbg("max_qp=%d\n", dsr->caps.max_qp);
> +
> + dsr->caps.sys_image_guid = cpu_to_be64(dev->sys_image_guid);
> + pr_dbg("sys_image_guid=%llx\n",
> + (long long unsigned int)be64_to_cpu(dsr->caps.sys_image_guid));
> +
> + dsr->caps.node_guid = cpu_to_be64(dev->node_guid);
> + pr_dbg("node_guid=%llx\n",
> + (long long unsigned int)be64_to_cpu(dsr->caps.node_guid));
> +
> + if (rm_get_phys_port_cnt(&dsr->caps.phys_port_cnt)) {
> + return;
> + }
> + pr_dbg("phys_port_cnt=%d\n", dsr->caps.phys_port_cnt);
> +
> + if (rm_get_max_qp_wrs(&dsr->caps.max_qp_wr)) {
> + return;
> + }
> + pr_dbg("max_qp_wr=%d\n", dsr->caps.max_qp_wr);
> +
> + if (rm_get_max_sges(&dsr->caps.max_sge)) {
> + return;
> + }
> + pr_dbg("max_sge=%d\n", dsr->caps.max_sge);
> +
> + if (rm_get_max_mrs(&dsr->caps.max_mr)) {
> + return;
> + }
> + pr_dbg("max_mr=%d\n", dsr->caps.max_mr);
> +
> + if (rm_get_max_pkeys(&dsr->caps.max_pkeys)) {
> + return;
> + }
> + pr_dbg("max_pkeys=%d\n", dsr->caps.max_pkeys);
> +
> + if (rm_get_max_ah(&dsr->caps.max_ah)) {
> + return;
> + }
> + pr_dbg("max_ah=%d\n", dsr->caps.max_ah);
> +
> + pr_dbg("Initialized\n");
> +}
> +
> +static void free_ports(PVRDMADev *dev)
> +{
> + int i;
> +
> + for (i = 0; i < MAX_PORTS; i++) {
> + free(dev->ports[i].gid_tbl);
> + kdbr_free_port(dev->ports[i].kdbr_port);
> + }
> +}
> +
> +static int init_ports(PVRDMADev *dev)
> +{
> + int i, ret = 0;
> + __u32 max_port_gids;
> + __u32 max_port_pkeys;
> +
> + memset(dev->ports, 0, sizeof(dev->ports));
> +
> + ret = rm_get_max_port_gids(&max_port_gids);
> + if (ret != 0) {
> + goto err;
> + }
> +
> + ret = rm_get_max_port_pkeys(&max_port_pkeys);
> + if (ret != 0) {
> + goto err;
> + }
> +
> + for (i = 0; i < MAX_PORTS; i++) {
> + dev->ports[i].state = PVRDMA_PORT_DOWN;
> +
> + dev->ports[i].pkey_tbl = malloc(sizeof(*dev->ports[i].pkey_tbl) *
> + max_port_pkeys);
> + if (dev->ports[i].gid_tbl == NULL) {
> + goto err_free_ports;
> + }
> +
> + memset(dev->ports[i].gid_tbl, 0, sizeof(dev->ports[i].gid_tbl));
> + }
> +
> + return 0;
> +
> +err_free_ports:
> + free_ports(dev);
> +
> +err:
> + pr_err("Fail to initialize device's ports\n");
> +
> + return ret;
> +}
> +
> +static void activate_device(PVRDMADev *dev)
> +{
> + set_reg_val(dev, PVRDMA_REG_ERR, 0);
> + pr_dbg("Device activated\n");
> +}
> +
> +static int quiesce_device(PVRDMADev *dev)
> +{
> + pr_dbg("Device quiesced\n");
> + return 0;
> +}
> +
> +static int reset_device(PVRDMADev *dev)
> +{
> + pr_dbg("Device reset complete\n");
> + return 0;
> +}
> +
> +static uint64_t regs_read(void *opaque, hwaddr addr, unsigned size)
> +{
> + PVRDMADev *dev = opaque;
> + __u32 val;
> +
> + /* pr_dbg("addr=0x%lx, size=%d\n", addr, size); */
> +
> + if (get_reg_val(dev, addr, &val)) {
> + pr_dbg("Error trying to read REG value from address 0x%x\n",
> + (__u32)addr);
> + return -EINVAL;
> + }
> +
> + /* pr_dbg("regs[0x%x]=0x%x\n", (__u32)addr, val); */
> +
> + return val;
> +}
> +
> +static void regs_write(void *opaque, hwaddr addr, uint64_t val, unsigned
> size)
> +{
> + PVRDMADev *dev = opaque;
> +
> + /* pr_dbg("addr=0x%lx, val=0x%x, size=%d\n", addr, (uint32_t)val, size);
> */
> +
> + if (set_reg_val(dev, addr, val)) {
> + pr_err("Error trying to set REG value, addr=0x%x, val=0x%lx\n",
> + (__u32)addr, val);
> + return;
> + }
> +
> + /* pr_dbg("regs[0x%x]=0x%lx\n", (__u32)addr, val); */
> +
> + switch (addr) {
> + case PVRDMA_REG_DSRLOW:
> + dev->dsr_info.dma = val;
> + break;
> + case PVRDMA_REG_DSRHIGH:
> + dev->dsr_info.dma |= val << 32;
> + load_dsr(dev);
> + init_dev_caps(dev);
> + break;
> + case PVRDMA_REG_CTL:
> + switch (val) {
> + case PVRDMA_DEVICE_CTL_ACTIVATE:
> + activate_device(dev);
> + break;
> + case PVRDMA_DEVICE_CTL_QUIESCE:
> + quiesce_device(dev);
> + break;
> + case PVRDMA_DEVICE_CTL_RESET:
> + reset_device(dev);
> + break;
> + }
> + case PVRDMA_REG_IMR:
> + pr_dbg("Interrupt mask=0x%lx\n", val);
> + dev->interrupt_mask = val;
> + break;
> + case PVRDMA_REG_REQUEST:
> + if (val == 0) {
> + execute_command(dev);
> + }
> + default:
> + break;
> + }
> +}
> +
> +static const MemoryRegionOps regs_ops = {
> + .read = regs_read,
> + .write = regs_write,
> + .endianness = DEVICE_LITTLE_ENDIAN,
> + .impl = {
> + .min_access_size = sizeof(uint32_t),
> + .max_access_size = sizeof(uint32_t),
> + },
> +};
> +
> +static uint64_t uar_read(void *opaque, hwaddr addr, unsigned size)
> +{
> + PVRDMADev *dev = opaque;
> + __u32 val;
> +
> + pr_dbg("addr=0x%lx, size=%d\n", addr, size);
> +
> + if (get_uar_val(dev, addr, &val)) {
> + pr_dbg("Error trying to read UAR value from address 0x%x\n",
> + (__u32)addr);
> + return -EINVAL;
> + }
> +
> + pr_dbg("uar[0x%x]=0x%x\n", (__u32)addr, val);
> +
> + return val;
> +}
> +
> +static void uar_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
> +{
> + PVRDMADev *dev = opaque;
> +
> + /* pr_dbg("addr=0x%lx, val=0x%x, size=%d\n", addr, (uint32_t)val, size);
> */
> +
> + if (set_uar_val(dev, addr, val)) {
> + pr_err("Error trying to set UAR value, addr=0x%x, val=0x%lx\n",
> + (__u32)addr, val);
> + return;
> + }
> +
> + /* pr_dbg("uar[0x%x]=0x%lx\n", (__u32)addr, val); */
> +
> + switch (addr) {
> + case PVRDMA_UAR_QP_OFFSET:
> + pr_dbg("UAR QP command, addr=0x%x, val=0x%lx\n", (__u32)addr, val);
> + if (val & PVRDMA_UAR_QP_SEND) {
> + qp_send(dev, val & PVRDMA_UAR_HANDLE_MASK);
> + }
> + if (val & PVRDMA_UAR_QP_RECV) {
> + qp_recv(dev, val & PVRDMA_UAR_HANDLE_MASK);
> + }
> + break;
> + case PVRDMA_UAR_CQ_OFFSET:
> + pr_dbg("UAR CQ command, addr=0x%x, val=0x%lx\n", (__u32)addr, val);
> + rm_req_notify_cq(dev, val & PVRDMA_UAR_HANDLE_MASK,
> + val & ~PVRDMA_UAR_HANDLE_MASK);
> + break;
> + default:
> + pr_err("Unsupported command, addr=0x%x, val=0x%lx\n", (__u32)addr,
> val);
> + break;
> + }
> +}
> +
> +static const MemoryRegionOps uar_ops = {
> + .read = uar_read,
> + .write = uar_write,
> + .endianness = DEVICE_LITTLE_ENDIAN,
> + .impl = {
> + .min_access_size = sizeof(uint32_t),
> + .max_access_size = sizeof(uint32_t),
> + },
> +};
> +
> +static void init_pci_config(PCIDevice *pdev)
> +{
> + pdev->config[PCI_INTERRUPT_PIN] = 1;
> +}
> +
> +static void init_bars(PCIDevice *pdev)
> +{
> + PVRDMADev *dev = PVRDMA_DEV(pdev);
> +
> + /* BAR 0 - MSI-X */
> + memory_region_init(&dev->msix, OBJECT(dev), "pvrdma-msix",
> + RDMA_BAR0_MSIX_SIZE);
> + pci_register_bar(pdev, RDMA_MSIX_BAR_IDX, PCI_BASE_ADDRESS_SPACE_MEMORY,
> + &dev->msix);
> +
> + /* BAR 1 - Registers */
> + memset(&dev->regs_data, 0, RDMA_BAR1_REGS_SIZE);
> + memory_region_init_io(&dev->regs, OBJECT(dev), ®s_ops, dev,
> + "pvrdma-regs", RDMA_BAR1_REGS_SIZE);
> + pci_register_bar(pdev, RDMA_REG_BAR_IDX, PCI_BASE_ADDRESS_SPACE_MEMORY,
> + &dev->regs);
> +
> + /* BAR 2 - UAR */
> + memset(&dev->uar_data, 0, RDMA_BAR2_UAR_SIZE);
> + memory_region_init_io(&dev->uar, OBJECT(dev), &uar_ops, dev, "rdma-uar",
> + RDMA_BAR2_UAR_SIZE);
> + pci_register_bar(pdev, RDMA_UAR_BAR_IDX, PCI_BASE_ADDRESS_SPACE_MEMORY,
> + &dev->uar);
> +}
> +
> +static void init_regs(PCIDevice *pdev)
> +{
> + PVRDMADev *dev = PVRDMA_DEV(pdev);
> +
> + set_reg_val(dev, PVRDMA_REG_VERSION, PVRDMA_HW_VERSION);
> + set_reg_val(dev, PVRDMA_REG_ERR, 0xFFFF);
> +}
> +
> +static void uninit_msix(PCIDevice *pdev, int used_vectors)
> +{
> + PVRDMADev *dev = PVRDMA_DEV(pdev);
> + int i;
> +
> + for (i = 0; i < used_vectors; i++) {
> + msix_vector_unuse(pdev, i);
> + }
> +
> + msix_uninit(pdev, &dev->msix, &dev->msix);
> +}
> +
> +static int init_msix(PCIDevice *pdev)
> +{
> + PVRDMADev *dev = PVRDMA_DEV(pdev);
> + int i;
> + int rc;
> +
> + rc = msix_init(pdev, RDMA_MAX_INTRS, &dev->msix, RDMA_MSIX_BAR_IDX,
> + RDMA_MSIX_TABLE, &dev->msix, RDMA_MSIX_BAR_IDX,
> + RDMA_MSIX_PBA, 0, NULL);
> +
> + if (rc < 0) {
> + pr_err("Fail to initialize MSI-X\n");
> + return rc;
> + }
> +
> + for (i = 0; i < RDMA_MAX_INTRS; i++) {
> + rc = msix_vector_use(PCI_DEVICE(dev), i);
> + if (rc < 0) {
> + pr_err("Fail mark MSI-X vercor %d\n", i);
> + uninit_msix(pdev, i);
> + return rc;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int pvrdma_init(PCIDevice *pdev)
> +{
> + int rc;
> + PVRDMADev *dev = PVRDMA_DEV(pdev);
> +
> + pr_info("Initializing device %s %x.%x\n", pdev->name,
> + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> +
> + dev->dsr_info.dsr = NULL;
> +
> + init_pci_config(pdev);
> +
> + init_bars(pdev);
> +
> + init_regs(pdev);
> +
> + rc = init_msix(pdev);
> + if (rc != 0) {
> + goto out;
> + }
> +
> + rc = kdbr_init();
> + if (rc != 0) {
> + goto out;
> + }
> +
> + rc = rm_init(dev);
> + if (rc != 0) {
> + goto out;
> + }
> +
> + rc = init_ports(dev);
> + if (rc != 0) {
> + goto out;
> + }
> +
> + rc = qp_ops_init();
> + if (rc != 0) {
> + goto out;
> + }
> +
> +out:
> + if (rc != 0) {
> + pr_err("Device fail to load\n");
> + }
> +
> + return rc;
> +}
> +
> +static void pvrdma_exit(PCIDevice *pdev)
> +{
> + PVRDMADev *dev = PVRDMA_DEV(pdev);
> +
> + pr_info("Closing device %s %x.%x\n", pdev->name,
> + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> +
> + qp_ops_fini();
> +
> + free_ports(dev);
> +
> + rm_fini(dev);
> +
> + kdbr_fini();
> +
> + free_dsr(dev);
> +
> + if (msix_enabled(pdev)) {
> + uninit_msix(pdev, RDMA_MAX_INTRS);
> + }
> +}
> +
> +static void pvrdma_class_init(ObjectClass *klass, void *data)
> +{
> + DeviceClass *dc = DEVICE_CLASS(klass);
> + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> + k->init = pvrdma_init;
> + k->exit = pvrdma_exit;
> + k->vendor_id = PCI_VENDOR_ID_VMWARE;
> + k->device_id = PCI_DEVICE_ID_VMWARE_PVRDMA;
> + k->revision = 0x00;
> + k->class_id = PCI_CLASS_NETWORK_OTHER;
> +
> + dc->desc = "RDMA Device";
> + dc->props = pvrdma_dev_properties;
> + set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
> +}
> +
> +static const TypeInfo pvrdma_info = {
> + .name = PVRDMA_HW_NAME,
> + .parent = TYPE_PCI_DEVICE,
> + .instance_size = sizeof(PVRDMADev),
> + .class_init = pvrdma_class_init,
> +};
> +
> +static void register_types(void)
> +{
> + type_register_static(&pvrdma_info);
> +}
> +
> +type_init(register_types)
> diff --git a/hw/net/pvrdma/pvrdma_qp_ops.c b/hw/net/pvrdma/pvrdma_qp_ops.c
> new file mode 100644
> index 0000000..2db45d9
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_qp_ops.c
> @@ -0,0 +1,174 @@
> +#include "hw/net/pvrdma/pvrdma.h"
> +#include "hw/net/pvrdma/pvrdma_utils.h"
> +#include "hw/net/pvrdma/pvrdma_qp_ops.h"
> +#include "hw/net/pvrdma/pvrdma_rm.h"
> +#include "hw/net/pvrdma/pvrdma-uapi.h"
> +#include "hw/net/pvrdma/pvrdma_kdbr.h"
> +#include "sysemu/dma.h"
> +#include "hw/pci/pci.h"
> +
> +typedef struct CompHandlerCtx {
> + PVRDMADev *dev;
> + u32 cq_handle;
> + struct pvrdma_cqe cqe;
> +} CompHandlerCtx;
> +
> +/*
> + * 1. Put CQE on send CQ ring
> + * 2. Put CQ number on dsr completion ring
> + * 3. Interrupt host
> + */
> +static int post_cqe(PVRDMADev *dev, u32 cq_handle, struct pvrdma_cqe *cqe)
> +{
> + struct pvrdma_cqe *cqe1;
> + struct pvrdma_cqne *cqne;
> + RmCQ *cq = rm_get_cq(dev, cq_handle);
> +
> + if (!cq) {
> + pr_dbg("Invalid cqn %d\n", cq_handle);
> + return -EINVAL;
> + }
> +
> + pr_dbg("cq->comp_type=%d\n", cq->comp_type);
> + if (cq->comp_type == CCT_NONE) {
> + return 0;
> + }
> + cq->comp_type = CCT_NONE;
> +
> + /* Step #1: Put CQE on CQ ring */
> + pr_dbg("Writing CQE\n");
> + cqe1 = ring_next_elem_write(&cq->cq);
> + if (!cqe1) {
> + return -EINVAL;
> + }
> +
> + memcpy(cqe1, cqe, sizeof(*cqe));
> + ring_write_inc(&cq->cq);
> +
> + /* Step #2: Put CQ number on dsr completion ring */
> + pr_dbg("Writing CQNE\n");
> + cqne = ring_next_elem_write(&dev->dsr_info.cq);
> + if (!cqne) {
> + return -EINVAL;
> + }
> +
> + cqne->info = cq_handle;
> + ring_write_inc(&dev->dsr_info.cq);
> +
> + post_interrupt(dev, INTR_VEC_CMD_COMPLETION_Q);
> +
> + return 0;
> +}
> +
> +static void qp_ops_comp_handler(int status, unsigned int vendor_err, void
> *ctx)
> +{
> + CompHandlerCtx *comp_ctx = (CompHandlerCtx *)ctx;
> +
> + pr_dbg("cq_handle=%d\n", comp_ctx->cq_handle);
> + pr_dbg("wr_id=%lld\n", comp_ctx->cqe.wr_id);
> + pr_dbg("status=%d\n", status);
> + pr_dbg("vendor_err=0x%x\n", vendor_err);
> + comp_ctx->cqe.status = status;
> + comp_ctx->cqe.vendor_err = vendor_err;
> + post_cqe(comp_ctx->dev, comp_ctx->cq_handle, &comp_ctx->cqe);
> + free(ctx);
> +}
> +
> +void qp_ops_fini(void)
> +{
> +}
> +
> +int qp_ops_init(void)
> +{
> + kdbr_register_tx_comp_handler(qp_ops_comp_handler);
> + kdbr_register_rx_comp_handler(qp_ops_comp_handler);
> +
> + return 0;
> +}
> +
> +int qp_send(PVRDMADev *dev, __u32 qp_handle)
> +{
> + RmQP *qp;
> + RmSqWqe *wqe;
> +
> + qp = rm_get_qp(dev, qp_handle);
> + if (!qp) {
> + return -EINVAL;
> + }
> +
> + if (qp->qp_state < PVRDMA_QPS_RTS) {
> + pr_dbg("Invalid QP state for send\n");
> + return -EINVAL;
> + }
> +
> + wqe = (struct RmSqWqe *)ring_next_elem_read(&qp->sq);
> + while (wqe) {
> + CompHandlerCtx *comp_ctx;
> +
> + pr_dbg("wr_id=%lld\n", wqe->hdr.wr_id);
> + wqe->hdr.num_sge = MIN(wqe->hdr.num_sge,
> + qp->init_args.max_send_sge);
> +
> + /* Prepare CQE */
> + comp_ctx = malloc(sizeof(CompHandlerCtx));
> + comp_ctx->dev = dev;
> + comp_ctx->cqe.wr_id = wqe->hdr.wr_id;
> + comp_ctx->cqe.qp = qp_handle;
> + comp_ctx->cq_handle = qp->init_args.send_cq_handle;
> + comp_ctx->cqe.opcode = wqe->hdr.opcode;
> + /* TODO: Fill rest of the data */
> +
> + kdbr_send_wqe(dev->ports[qp->port_num].kdbr_port,
> + qp->kdbr_connection_id,
> + qp->init_args.qp_type == PVRDMA_QPT_RC, wqe, comp_ctx);
> +
> + ring_read_inc(&qp->sq);
> +
> + wqe = ring_next_elem_read(&qp->sq);
> + }
> +
> + return 0;
> +}
> +
> +int qp_recv(PVRDMADev *dev, __u32 qp_handle)
> +{
> + RmQP *qp;
> + RmRqWqe *wqe;
> +
> + qp = rm_get_qp(dev, qp_handle);
> + if (!qp) {
> + return -EINVAL;
> + }
> +
> + if (qp->qp_state < PVRDMA_QPS_RTR) {
> + pr_dbg("Invalid QP state for receive\n");
> + return -EINVAL;
> + }
> +
> + wqe = (struct RmRqWqe *)ring_next_elem_read(&qp->rq);
> + while (wqe) {
> + CompHandlerCtx *comp_ctx;
> +
> + pr_dbg("wr_id=%lld\n", wqe->hdr.wr_id);
> + wqe->hdr.num_sge = MIN(wqe->hdr.num_sge,
> + qp->init_args.max_send_sge);
> +
> + /* Prepare CQE */
> + comp_ctx = malloc(sizeof(CompHandlerCtx));
> + comp_ctx->dev = dev;
> + comp_ctx->cqe.qp = qp_handle;
> + comp_ctx->cq_handle = qp->init_args.recv_cq_handle;
> + comp_ctx->cqe.wr_id = wqe->hdr.wr_id;
> + comp_ctx->cqe.qp = qp_handle;
> + /* TODO: Fill rest of the data */
> +
> + kdbr_recv_wqe(dev->ports[qp->port_num].kdbr_port,
> + qp->kdbr_connection_id, wqe, comp_ctx);
> +
> + ring_read_inc(&qp->rq);
> +
> + wqe = ring_next_elem_read(&qp->rq);
> + }
> +
> + return 0;
> +}
> diff --git a/hw/net/pvrdma/pvrdma_qp_ops.h b/hw/net/pvrdma/pvrdma_qp_ops.h
> new file mode 100644
> index 0000000..20125d6
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_qp_ops.h
> @@ -0,0 +1,25 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA QP Operations
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + * Yuval Shaia <address@hidden>
> + * Marcel Apfelbaum <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_QP_H
> +#define PVRDMA_QP_H
> +
> +typedef struct PVRDMADev PVRDMADev;
> +
> +int qp_ops_init(void);
> +void qp_ops_fini(void);
> +int qp_send(PVRDMADev *dev, __u32 qp_handle);
> +int qp_recv(PVRDMADev *dev, __u32 qp_handle);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_ring.c b/hw/net/pvrdma/pvrdma_ring.c
> new file mode 100644
> index 0000000..34dc1f5
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_ring.c
> @@ -0,0 +1,127 @@
> +#include <qemu/osdep.h>
> +#include <hw/pci/pci.h>
> +#include <cpu.h>
> +#include <hw/net/pvrdma/pvrdma_ring.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +#include <hw/net/pvrdma/pvrdma_utils.h>
> +
> +int ring_init(Ring *ring, const char *name, PCIDevice *dev,
> + struct pvrdma_ring *ring_state, size_t max_elems, size_t
> elem_sz,
> + dma_addr_t *tbl, dma_addr_t npages)
> +{
> + int i;
> + int rc = 0;
> +
> + strncpy(ring->name, name, MAX_RING_NAME_SZ);
> + ring->name[MAX_RING_NAME_SZ - 1] = 0;
> + pr_info("Initializing %s ring\n", ring->name);
> + ring->dev = dev;
> + ring->ring_state = ring_state;
> + ring->max_elems = max_elems;
> + ring->elem_sz = elem_sz;
> + pr_dbg("ring->elem_sz=%ld\n", ring->elem_sz);
> + pr_dbg("npages=%ld\n", npages);
> + /* TODO: Give a moment to think if we want to redo driver settings
> + atomic_set(&ring->ring_state->prod_tail, 0);
> + atomic_set(&ring->ring_state->cons_head, 0);
> + */
> + ring->npages = npages;
> + ring->pages = malloc(npages * sizeof(void *));
> + for (i = 0; i < npages; i++) {
> + if (!tbl[i]) {
> + pr_err("npages=%ld but tbl[%d] is NULL\n", npages, i);
> + continue;
> + }
> +
> + ring->pages[i] = pvrdma_pci_dma_map(dev, tbl[i], TARGET_PAGE_SIZE);
> + if (!ring->pages[i]) {
> + rc = -ENOMEM;
> + pr_err("Fail to map to page %d\n", i);
> + goto out_free;
> + }
> + }
> +
> + goto out;
> +
> +out_free:
> + while (i--) {
> + pvrdma_pci_dma_unmap(dev, ring->pages[i], TARGET_PAGE_SIZE);
> + }
> + free(ring->pages);
> +
> +out:
> + return rc;
> +}
> +
> +void *ring_next_elem_read(Ring *ring)
> +{
> + unsigned int idx = 0, offset;
> +
> + /*
> + pr_dbg("%s: t=%d, h=%d\n", ring->name, ring->ring_state->prod_tail,
> + ring->ring_state->cons_head);
> + */
> +
> + if (!pvrdma_idx_ring_has_data(ring->ring_state, ring->max_elems, &idx)) {
> + pr_dbg("No more data in ring\n");
> + return NULL;
> + }
> +
> + offset = idx * ring->elem_sz;
> + /*
> + pr_dbg("idx=%d\n", idx);
> + pr_dbg("offset=%d\n", offset);
> + */
> + return ring->pages[offset / TARGET_PAGE_SIZE] + (offset %
> TARGET_PAGE_SIZE);
> +}
> +
> +void ring_read_inc(Ring *ring)
> +{
> + pvrdma_idx_ring_inc(&ring->ring_state->cons_head, ring->max_elems);
> + /*
> + pr_dbg("%s: t=%d, h=%d, m=%ld\n", ring->name,
> + ring->ring_state->prod_tail, ring->ring_state->cons_head,
> + ring->max_elems);
> + */
> +}
> +
> +void *ring_next_elem_write(Ring *ring)
> +{
> + unsigned int idx, offset, tail;
> +
> + /*
> + pr_dbg("%s: t=%d, h=%d\n", ring->name, ring->ring_state->prod_tail,
> + ring->ring_state->cons_head);
> + */
> +
> + if (!pvrdma_idx_ring_has_space(ring->ring_state, ring->max_elems,
> &tail)) {
> + pr_dbg("CQ is full\n");
> + return NULL;
> + }
> +
> + idx = pvrdma_idx(&ring->ring_state->prod_tail, ring->max_elems);
> + /* TODO: tail == idx */
> +
> + offset = idx * ring->elem_sz;
> + return ring->pages[offset / TARGET_PAGE_SIZE] + (offset %
> TARGET_PAGE_SIZE);
> +}
> +
> +void ring_write_inc(Ring *ring)
> +{
> + pvrdma_idx_ring_inc(&ring->ring_state->prod_tail, ring->max_elems);
> + /*
> + pr_dbg("%s: t=%d, h=%d, m=%ld\n", ring->name,
> + ring->ring_state->prod_tail, ring->ring_state->cons_head,
> + ring->max_elems);
> + */
> +}
> +
> +void ring_free(Ring *ring)
> +{
> + while (ring->npages--) {
> + pvrdma_pci_dma_unmap(ring->dev, ring->pages[ring->npages],
> + TARGET_PAGE_SIZE);
> + }
> +
> + free(ring->pages);
> +}
> diff --git a/hw/net/pvrdma/pvrdma_ring.h b/hw/net/pvrdma/pvrdma_ring.h
> new file mode 100644
> index 0000000..8a0c448
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_ring.h
> @@ -0,0 +1,43 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA interface definitions
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + * Yuval Shaia <address@hidden>
> + * Marcel Apfelbaum <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_RING_H
> +#define PVRDMA_RING_H
> +
> +#include <qemu/typedefs.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +
> +#define MAX_RING_NAME_SZ 16
> +
> +typedef struct Ring {
> + char name[MAX_RING_NAME_SZ];
> + PCIDevice *dev;
> + size_t max_elems;
> + size_t elem_sz;
> + struct pvrdma_ring *ring_state;
> + int npages;
> + void **pages;
> +} Ring;
> +
> +int ring_init(Ring *ring, const char *name, PCIDevice *dev,
> + struct pvrdma_ring *ring_state, size_t max_elems, size_t
> elem_sz,
> + dma_addr_t *tbl, dma_addr_t npages);
> +void *ring_next_elem_read(Ring *ring);
> +void ring_read_inc(Ring *ring);
> +void *ring_next_elem_write(Ring *ring);
> +void ring_write_inc(Ring *ring);
> +void ring_free(Ring *ring);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_rm.c b/hw/net/pvrdma/pvrdma_rm.c
> new file mode 100644
> index 0000000..55ca1e5
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_rm.c
> @@ -0,0 +1,529 @@
> +#include <hw/net/pvrdma/pvrdma.h>
> +#include <hw/net/pvrdma/pvrdma_utils.h>
> +#include <hw/net/pvrdma/pvrdma_rm.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +#include <hw/net/pvrdma/pvrdma_kdbr.h>
> +#include <qemu/bitmap.h>
> +#include <qemu/atomic.h>
> +#include <cpu.h>
> +
> +/* Page directory and page tables */
> +#define PG_DIR_SZ { TARGET_PAGE_SIZE / sizeof(__u64) }
> +#define PG_TBL_SZ { TARGET_PAGE_SIZE / sizeof(__u64) }
> +
> +/* Global local and remote keys */
> +__u64 global_lkey = 1;
> +__u64 global_rkey = 1;
> +
> +static inline int res_tbl_init(const char *name, RmResTbl *tbl, u32 tbl_sz,
> + u32 res_sz)
> +{
> + tbl->tbl = malloc(tbl_sz * res_sz);
> + if (!tbl->tbl) {
> + return -ENOMEM;
> + }
> +
> + strncpy(tbl->name, name, MAX_RING_NAME_SZ);
> + tbl->name[MAX_RING_NAME_SZ - 1] = 0;
> +
> + tbl->bitmap = bitmap_new(tbl_sz);
> + tbl->tbl_sz = tbl_sz;
> + tbl->res_sz = res_sz;
> + qemu_mutex_init(&tbl->lock);
> +
> + return 0;
> +}
> +
> +static inline void res_tbl_free(RmResTbl *tbl)
> +{
> + qemu_mutex_destroy(&tbl->lock);
> + free(tbl->tbl);
> + bitmap_zero_extend(tbl->bitmap, tbl->tbl_sz, 0);
> +}
> +
> +static inline void *res_tbl_get(RmResTbl *tbl, u32 handle)
> +{
> + pr_dbg("%s, handle=%d\n", tbl->name, handle);
> +
> + if ((handle < tbl->tbl_sz) && (test_bit(handle, tbl->bitmap))) {
> + return tbl->tbl + handle * tbl->res_sz;
> + } else {
> + pr_dbg("Invalid handle %d\n", handle);
> + return NULL;
> + }
> +}
> +
> +static inline void *res_tbl_alloc(RmResTbl *tbl, u32 *handle)
> +{
> + qemu_mutex_lock(&tbl->lock);
> +
> + *handle = find_first_zero_bit(tbl->bitmap, tbl->tbl_sz);
> + if (*handle > tbl->tbl_sz) {
> + pr_dbg("Fail to alloc, bitmap is full\n");
> + qemu_mutex_unlock(&tbl->lock);
> + return NULL;
> + }
> +
> + set_bit(*handle, tbl->bitmap);
> +
> + qemu_mutex_unlock(&tbl->lock);
> +
> + pr_dbg("%s, handle=%d\n", tbl->name, *handle);
> +
> + return tbl->tbl + *handle * tbl->res_sz;
> +}
> +
> +static inline void res_tbl_dealloc(RmResTbl *tbl, u32 handle)
> +{
> + pr_dbg("%s, handle=%d\n", tbl->name, handle);
> +
> + qemu_mutex_lock(&tbl->lock);
> +
> + if (handle < tbl->tbl_sz) {
> + clear_bit(handle, tbl->bitmap);
> + }
> +
> + qemu_mutex_unlock(&tbl->lock);
> +}
> +
> +int rm_alloc_pd(PVRDMADev *dev, __u32 *pd_handle, __u32 ctx_handle)
> +{
> + RmPD *pd;
> +
> + pd = res_tbl_alloc(&dev->pd_tbl, pd_handle);
> + if (!pd) {
> + return -ENOMEM;
> + }
> +
> + pd->ctx_handle = ctx_handle;
> +
> + return 0;
> +}
> +
> +void rm_dealloc_pd(PVRDMADev *dev, __u32 pd_handle)
> +{
> + res_tbl_dealloc(&dev->pd_tbl, pd_handle);
> +}
> +
> +RmCQ *rm_get_cq(PVRDMADev *dev, __u32 cq_handle)
> +{
> + return res_tbl_get(&dev->cq_tbl, cq_handle);
> +}
> +
> +int rm_alloc_cq(PVRDMADev *dev, struct pvrdma_cmd_create_cq *cmd,
> + struct pvrdma_cmd_create_cq_resp *resp)
> +{
> + int rc = 0;
> + RmCQ *cq;
> + PCIDevice *pci_dev = PCI_DEVICE(dev);
> + __u64 *dir = 0, *tbl = 0;
> + char ring_name[MAX_RING_NAME_SZ];
> + u32 cqe;
> +
> + cq = res_tbl_alloc(&dev->cq_tbl, &resp->cq_handle);
> + if (!cq) {
> + return -ENOMEM;
> + }
> +
> + memset(cq, 0, sizeof(RmCQ));
> +
> + memcpy(&cq->init_args, cmd, sizeof(*cmd));
> + cq->comp_type = CCT_NONE;
> +
> + /* Get pointer to CQ */
> + dir = pvrdma_pci_dma_map(pci_dev, cq->init_args.pdir_dma,
> TARGET_PAGE_SIZE);
> + if (!dir) {
> + pr_err("Fail to map to CQ page directory\n");
> + rc = -ENOMEM;
> + goto out_free_cq;
> + }
> + tbl = pvrdma_pci_dma_map(pci_dev, dir[0], TARGET_PAGE_SIZE);
> + if (!tbl) {
> + pr_err("Fail to map to CQ page table\n");
> + rc = -ENOMEM;
> + goto out_free_cq;
> + }
> +
> + cq->ring_state = (struct pvrdma_ring *)
> + pvrdma_pci_dma_map(pci_dev, tbl[0], TARGET_PAGE_SIZE);
> + if (!cq->ring_state) {
> + pr_err("Fail to map to CQ header page\n");
> + rc = -ENOMEM;
> + goto out_free_cq;
> + }
> +
> + sprintf(ring_name, "cq%d", resp->cq_handle);
> + cqe = MIN(cmd->cqe, dev->dsr_info.dsr->caps.max_cqe);
> + rc = ring_init(&cq->cq, ring_name, pci_dev, &cq->ring_state[1],
> + cqe, sizeof(struct pvrdma_cqe), (dma_addr_t *)&tbl[1],
> + cmd->nchunks - 1 /* first page is ring state */);
> + if (rc != 0) {
> + pr_err("Fail to initialize CQ ring\n");
> + rc = -ENOMEM;
> + goto out_free_ring_state;
> + }
> +
> +
> + resp->cqe = cmd->cqe;
> +
> + goto out;
> +
> +out_free_ring_state:
> + pvrdma_pci_dma_unmap(pci_dev, cq->ring_state, TARGET_PAGE_SIZE);
> +
> +out_free_cq:
> + rm_dealloc_cq(dev, resp->cq_handle);
> +
> +out:
> + if (tbl) {
> + pvrdma_pci_dma_unmap(pci_dev, tbl, TARGET_PAGE_SIZE);
> + }
> + if (dir) {
> + pvrdma_pci_dma_unmap(pci_dev, dir, TARGET_PAGE_SIZE);
> + }
> +
> + return rc;
> +}
> +
> +void rm_req_notify_cq(PVRDMADev *dev, __u32 cq_handle, u32 flags)
> +{
> + RmCQ *cq;
> +
> + pr_dbg("cq_handle=%d, flags=0x%x\n", cq_handle, flags);
> +
> + cq = rm_get_cq(dev, cq_handle);
> + if (!cq) {
> + return;
> + }
> +
> + cq->comp_type = (flags & PVRDMA_UAR_CQ_ARM_SOL) ? CCT_SOLICITED :
> + CCT_NEXT_COMP;
> + pr_dbg("comp_type=%d\n", cq->comp_type);
> +}
> +
> +void rm_dealloc_cq(PVRDMADev *dev, __u32 cq_handle)
> +{
> + PCIDevice *pci_dev = PCI_DEVICE(dev);
> + RmCQ *cq;
> +
> + cq = rm_get_cq(dev, cq_handle);
> + if (!cq) {
> + return;
> + }
> +
> + ring_free(&cq->cq);
> + pvrdma_pci_dma_unmap(pci_dev, cq->ring_state, TARGET_PAGE_SIZE);
> + res_tbl_dealloc(&dev->cq_tbl, cq_handle);
> +}
> +
> +int rm_alloc_mr(PVRDMADev *dev, struct pvrdma_cmd_create_mr *cmd,
> + struct pvrdma_cmd_create_mr_resp *resp)
> +{
> + RmMR *mr;
> +
> + mr = res_tbl_alloc(&dev->mr_tbl, &resp->mr_handle);
> + if (!mr) {
> + return -ENOMEM;
> + }
> +
> + mr->pd_handle = cmd->pd_handle;
> + resp->lkey = mr->lkey = global_lkey++;
> + resp->rkey = mr->rkey = global_rkey++;
> +
> + return 0;
> +}
> +
> +void rm_dealloc_mr(PVRDMADev *dev, __u32 mr_handle)
> +{
> + res_tbl_dealloc(&dev->mr_tbl, mr_handle);
> +}
> +
> +int rm_alloc_qp(PVRDMADev *dev, struct pvrdma_cmd_create_qp *cmd,
> + struct pvrdma_cmd_create_qp_resp *resp)
> +{
> + int rc = 0;
> + RmQP *qp;
> + PCIDevice *pci_dev = PCI_DEVICE(dev);
> + __u64 *dir = 0, *tbl = 0;
> + int wqe_size;
> + char ring_name[MAX_RING_NAME_SZ];
> +
> + if (!rm_get_cq(dev, cmd->send_cq_handle) ||
> + !rm_get_cq(dev, cmd->recv_cq_handle)) {
> + pr_err("Invalid send_cqn or recv_cqn (%d, %d)\n",
> + cmd->send_cq_handle, cmd->recv_cq_handle);
> + return -EINVAL;
> + }
> +
> + qp = res_tbl_alloc(&dev->qp_tbl, &resp->qpn);
> + if (!qp) {
> + return -EINVAL;
> + }
> +
> + memset(qp, 0, sizeof(RmQP));
> +
> + memcpy(&qp->init_args, cmd, sizeof(*cmd));
> +
> + pr_dbg("qp_type=%d\n", qp->init_args.qp_type);
> + pr_dbg("send_cq_handle=%d\n", qp->init_args.send_cq_handle);
> + pr_dbg("max_send_sge=%d\n", qp->init_args.max_send_sge);
> + pr_dbg("recv_cq_handle=%d\n", qp->init_args.recv_cq_handle);
> + pr_dbg("max_recv_sge=%d\n", qp->init_args.max_recv_sge);
> + pr_dbg("total_chunks=%d\n", cmd->total_chunks);
> + pr_dbg("send_chunks=%d\n", cmd->send_chunks);
> + pr_dbg("recv_chunks=%d\n", cmd->total_chunks - cmd->send_chunks);
> +
> + qp->qp_state = PVRDMA_QPS_ERR;
> +
> + /* Get pointer to send & recv rings */
> + dir = pvrdma_pci_dma_map(pci_dev, qp->init_args.pdir_dma,
> TARGET_PAGE_SIZE);
> + if (!dir) {
> + pr_err("Fail to map to QP page directory\n");
> + rc = -ENOMEM;
> + goto out_free_qp;
> + }
> + tbl = pvrdma_pci_dma_map(pci_dev, dir[0], TARGET_PAGE_SIZE);
> + if (!tbl) {
> + pr_err("Fail to map to QP page table\n");
> + rc = -ENOMEM;
> + goto out_free_qp;
> + }
> +
> + /* Send ring */
> + qp->sq_ring_state = (struct pvrdma_ring *)
> + pvrdma_pci_dma_map(pci_dev, tbl[0], TARGET_PAGE_SIZE);
> + if (!qp->sq_ring_state) {
> + pr_err("Fail to map to QP header page\n");
> + rc = -ENOMEM;
> + goto out_free_qp;
> + }
> +
> + wqe_size = roundup_pow_of_two(sizeof(struct pvrdma_sq_wqe_hdr) +
> + sizeof(struct pvrdma_sge) *
> + qp->init_args.max_send_sge);
> + sprintf(ring_name, "qp%d_sq", resp->qpn);
> + rc = ring_init(&qp->sq, ring_name, pci_dev, qp->sq_ring_state,
> + qp->init_args.max_send_wr, wqe_size,
> + (dma_addr_t *)&tbl[1], cmd->send_chunks);
> + if (rc != 0) {
> + pr_err("Fail to initialize SQ ring\n");
> + rc = -ENOMEM;
> + goto out_free_ring_state;
> + }
> +
> + /* Recv ring */
> + qp->rq_ring_state = &qp->sq_ring_state[1];
> + wqe_size = roundup_pow_of_two(sizeof(struct pvrdma_rq_wqe_hdr) +
> + sizeof(struct pvrdma_sge) *
> + qp->init_args.max_recv_sge);
> + pr_dbg("wqe_size=%d\n", wqe_size);
> + pr_dbg("pvrdma_rq_wqe_hdr=%ld\n", sizeof(struct pvrdma_rq_wqe_hdr));
> + pr_dbg("pvrdma_sge=%ld\n", sizeof(struct pvrdma_sge));
> + pr_dbg("init_args.max_recv_sge=%d\n", qp->init_args.max_recv_sge);
> + sprintf(ring_name, "qp%d_rq", resp->qpn);
> + rc = ring_init(&qp->rq, ring_name, pci_dev, qp->rq_ring_state,
> + qp->init_args.max_recv_wr, wqe_size,
> + (dma_addr_t *)&tbl[2], cmd->total_chunks -
> + cmd->send_chunks - 1 /* first page is ring state */);
> + if (rc != 0) {
> + pr_err("Fail to initialize RQ ring\n");
> + rc = -ENOMEM;
> + goto out_free_send_ring;
> + }
> +
> + resp->max_send_wr = cmd->max_send_wr;
> + resp->max_recv_wr = cmd->max_recv_wr;
> + resp->max_send_sge = cmd->max_send_sge;
> + resp->max_recv_sge = cmd->max_recv_sge;
> + resp->max_inline_data = cmd->max_inline_data;
> +
> + goto out;
> +
> +out_free_send_ring:
> + ring_free(&qp->sq);
> +
> +out_free_ring_state:
> + pvrdma_pci_dma_unmap(pci_dev, qp->sq_ring_state, TARGET_PAGE_SIZE);
> +
> +out_free_qp:
> + rm_dealloc_qp(dev, resp->qpn);
> +
> +out:
> + if (tbl) {
> + pvrdma_pci_dma_unmap(pci_dev, tbl, TARGET_PAGE_SIZE);
> + }
> + if (dir) {
> + pvrdma_pci_dma_unmap(pci_dev, dir, TARGET_PAGE_SIZE);
> + }
> +
> + return rc;
> +}
> +
> +int rm_modify_qp(PVRDMADev *dev, __u32 qp_handle,
> + struct pvrdma_cmd_modify_qp *modify_qp_args)
> +{
> + RmQP *qp;
> +
> + pr_dbg("qp_handle=%d\n", qp_handle);
> + pr_dbg("new_state=%d\n", modify_qp_args->attrs.qp_state);
> +
> + qp = res_tbl_get(&dev->qp_tbl, qp_handle);
> + if (!qp) {
> + return -EINVAL;
> + }
> +
> + pr_dbg("qp_type=%d\n", qp->init_args.qp_type);
> +
> + if (modify_qp_args->attr_mask & PVRDMA_QP_PORT) {
> + qp->port_num = modify_qp_args->attrs.port_num - 1;
> + }
> + if (modify_qp_args->attr_mask & PVRDMA_QP_DEST_QPN) {
> + qp->dest_qp_num = modify_qp_args->attrs.dest_qp_num;
> + }
> + if (modify_qp_args->attr_mask & PVRDMA_QP_AV) {
> + qp->dgid = modify_qp_args->attrs.ah_attr.grh.dgid;
> + qp->port_num = modify_qp_args->attrs.ah_attr.port_num - 1;
> + }
> + if (modify_qp_args->attr_mask & PVRDMA_QP_STATE) {
> + qp->qp_state = modify_qp_args->attrs.qp_state;
> + }
> +
> + /* kdbr connection */
> + if (qp->qp_state == PVRDMA_QPS_RTR) {
> + qp->kdbr_connection_id =
> + kdbr_open_connection(dev->ports[qp->port_num].kdbr_port,
> + qp_handle, qp->dgid, qp->dest_qp_num,
> + qp->init_args.qp_type == PVRDMA_QPT_RC);
> + if (qp->kdbr_connection_id == 0) {
> + return -EIO;
> + }
> + }
> +
> + return 0;
> +}
> +
> +void rm_dealloc_qp(PVRDMADev *dev, __u32 qp_handle)
> +{
> + PCIDevice *pci_dev = PCI_DEVICE(dev);
> + RmQP *qp;
> +
> + qp = res_tbl_get(&dev->qp_tbl, qp_handle);
> + if (!qp) {
> + return;
> + }
> +
> + if (qp->kdbr_connection_id) {
> + kdbr_close_connection(dev->ports[qp->port_num].kdbr_port,
> + qp->kdbr_connection_id);
> + }
> +
> + ring_free(&qp->rq);
> + ring_free(&qp->sq);
> +
> + pvrdma_pci_dma_unmap(pci_dev, qp->sq_ring_state, TARGET_PAGE_SIZE);
> +
> + res_tbl_dealloc(&dev->qp_tbl, qp_handle);
> +}
> +
> +RmQP *rm_get_qp(PVRDMADev *dev, __u32 qp_handle)
> +{
> + return res_tbl_get(&dev->qp_tbl, qp_handle);
> +}
> +
> +void *rm_get_wqe_ctx(PVRDMADev *dev, unsigned long wqe_ctx_id)
> +{
> + void **wqe_ctx;
> +
> + wqe_ctx = res_tbl_get(&dev->wqe_ctx_tbl, wqe_ctx_id);
> + if (!wqe_ctx) {
> + return NULL;
> + }
> +
> + pr_dbg("ctx=%p\n", *wqe_ctx);
> +
> + return *wqe_ctx;
> +}
> +
> +int rm_alloc_wqe_ctx(PVRDMADev *dev, unsigned long *wqe_ctx_id, void *ctx)
> +{
> + void **wqe_ctx;
> +
> + wqe_ctx = res_tbl_alloc(&dev->wqe_ctx_tbl, (u32 *)wqe_ctx_id);
> + if (!wqe_ctx) {
> + return -ENOMEM;
> + }
> +
> + pr_dbg("ctx=%p\n", ctx);
> + *wqe_ctx = ctx;
> +
> + return 0;
> +}
> +
> +void rm_dealloc_wqe_ctx(PVRDMADev *dev, unsigned long wqe_ctx_id)
> +{
> + res_tbl_dealloc(&dev->wqe_ctx_tbl, (u32) wqe_ctx_id);
> +}
> +
> +int rm_init(PVRDMADev *dev)
> +{
> + int ret = 0;
> +
> + ret = res_tbl_init("PD", &dev->pd_tbl, MAX_PDS, sizeof(RmPD));
> + if (ret != 0) {
> + goto cln_pds;
> + }
> +
> + ret = res_tbl_init("CQ", &dev->cq_tbl, MAX_CQS, sizeof(RmCQ));
> + if (ret != 0) {
> + goto cln_cqs;
> + }
> +
> + ret = res_tbl_init("MR", &dev->mr_tbl, MAX_MRS, sizeof(RmMR));
> + if (ret != 0) {
> + goto cln_mrs;
> + }
> +
> + ret = res_tbl_init("QP", &dev->qp_tbl, MAX_QPS, sizeof(RmQP));
> + if (ret != 0) {
> + goto cln_qps;
> + }
> +
> + ret = res_tbl_init("WQE_CTX", &dev->wqe_ctx_tbl, MAX_QPS * MAX_QP_WRS,
> + sizeof(void *));
> + if (ret != 0) {
> + goto cln_wqe_ctxs;
> + }
> +
> + goto out;
> +
> +cln_wqe_ctxs:
> + res_tbl_free(&dev->wqe_ctx_tbl);
> +
> +cln_qps:
> + res_tbl_free(&dev->qp_tbl);
> +
> +cln_mrs:
> + res_tbl_free(&dev->mr_tbl);
> +
> +cln_cqs:
> + res_tbl_free(&dev->cq_tbl);
> +
> +cln_pds:
> + res_tbl_free(&dev->pd_tbl);
> +
> +out:
> + if (ret != 0) {
> + pr_err("Fail to initialize RM\n");
> + }
> +
> + return ret;
> +}
> +
> +void rm_fini(PVRDMADev *dev)
> +{
> + res_tbl_free(&dev->pd_tbl);
> + res_tbl_free(&dev->cq_tbl);
> + res_tbl_free(&dev->mr_tbl);
> + res_tbl_free(&dev->qp_tbl);
> + res_tbl_free(&dev->wqe_ctx_tbl);
> +}
> diff --git a/hw/net/pvrdma/pvrdma_rm.h b/hw/net/pvrdma/pvrdma_rm.h
> new file mode 100644
> index 0000000..1d42bc7
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_rm.h
> @@ -0,0 +1,214 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA - Resource Manager
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + * Yuval Shaia <address@hidden>
> + * Marcel Apfelbaum <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_RM_H
> +#define PVRDMA_RM_H
> +
> +#include <hw/net/pvrdma/pvrdma_dev_api.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +#include <hw/net/pvrdma/pvrdma_ring.h>
> +#include <hw/net/pvrdma/kdbr.h>
> +
> +/* TODO: More then 1 port it fails in ib_modify_qp, maybe something with
> + * the MAC of the second port */
> +#define MAX_PORTS 1 /* Driver force to 1 see pvrdma_add_gid */
> +#define MAX_PORT_GIDS 1
> +#define MAX_PORT_PKEYS 1
> +#define MAX_PKEYS 1
> +#define MAX_PDS 2048
> +#define MAX_CQS 2048
> +#define MAX_CQES 1024 /* cqe size is 64 */
> +#define MAX_QPS 1024
> +#define MAX_GIDS 2048
> +#define MAX_QP_WRS 1024 /* wqe size is 128 */
> +#define MAX_SGES 4
> +#define MAX_MRS 2048
> +#define MAX_AH 1024
> +
> +typedef struct PVRDMADev PVRDMADev;
> +typedef struct KdbrPort KdbrPort;
> +
> +#define MAX_RMRESTBL_NAME_SZ 16
> +typedef struct RmResTbl {
> + char name[MAX_RMRESTBL_NAME_SZ];
> + unsigned long *bitmap;
> + size_t tbl_sz;
> + size_t res_sz;
> + void *tbl;
> + QemuMutex lock;
> +} RmResTbl;
> +
> +enum cq_comp_type {
> + CCT_NONE,
> + CCT_SOLICITED,
> + CCT_NEXT_COMP,
> +};
> +
> +typedef struct RmPD {
> + __u32 ctx_handle;
> +} RmPD;
> +
> +typedef struct RmCQ {
> + struct pvrdma_cmd_create_cq init_args;
> + struct pvrdma_ring *ring_state;
> + Ring cq;
> + enum cq_comp_type comp_type;
> +} RmCQ;
> +
> +/* MR (DMA region) */
> +typedef struct RmMR {
> + __u32 pd_handle;
> + __u32 lkey;
> + __u32 rkey;
> +} RmMR;
> +
> +typedef struct RmSqWqe {
> + struct pvrdma_sq_wqe_hdr hdr;
> + struct pvrdma_sge sge[0];
> +} RmSqWqe;
> +
> +typedef struct RmRqWqe {
> + struct pvrdma_rq_wqe_hdr hdr;
> + struct pvrdma_sge sge[0];
> +} RmRqWqe;
> +
> +typedef struct RmQP {
> + struct pvrdma_cmd_create_qp init_args;
> + enum pvrdma_qp_state qp_state;
> + u8 port_num;
> + u32 dest_qp_num;
> + union pvrdma_gid dgid;
> +
> + struct pvrdma_ring *sq_ring_state;
> + Ring sq;
> + struct pvrdma_ring *rq_ring_state;
> + Ring rq;
> +
> + unsigned long kdbr_connection_id;
> +} RmQP;
> +
> +typedef struct RmPort {
> + enum pvrdma_port_state state;
> + union pvrdma_gid gid_tbl[MAX_PORT_GIDS];
> + /* TODO: Change type */
> + int *pkey_tbl;
> + KdbrPort *kdbr_port;
> +} RmPort;
> +
> +static inline int rm_get_max_port_gids(__u32 *max_port_gids)
> +{
> + *max_port_gids = MAX_PORT_GIDS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_port_pkeys(__u32 *max_port_pkeys)
> +{
> + *max_port_pkeys = MAX_PORT_PKEYS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_pkeys(__u16 *max_pkeys)
> +{
> + *max_pkeys = MAX_PKEYS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_cqs(__u32 *max_cqs)
> +{
> + *max_cqs = MAX_CQS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_cqes(__u32 *max_cqes)
> +{
> + *max_cqes = MAX_CQES;
> + return 0;
> +}
> +
> +static inline int rm_get_max_pds(__u32 *max_pds)
> +{
> + *max_pds = MAX_PDS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_qps(__u32 *max_qps)
> +{
> + *max_qps = MAX_QPS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_gids(__u32 *max_gids)
> +{
> + *max_gids = MAX_GIDS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_qp_wrs(__u32 *max_qp_wrs)
> +{
> + *max_qp_wrs = MAX_QP_WRS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_sges(__u32 *max_sges)
> +{
> + *max_sges = MAX_SGES;
> + return 0;
> +}
> +
> +static inline int rm_get_max_mrs(__u32 *max_mrs)
> +{
> + *max_mrs = MAX_MRS;
> + return 0;
> +}
> +
> +static inline int rm_get_phys_port_cnt(__u8 *phys_port_cnt)
> +{
> + *phys_port_cnt = MAX_PORTS;
> + return 0;
> +}
> +
> +static inline int rm_get_max_ah(__u32 *max_ah)
> +{
> + *max_ah = MAX_AH;
> + return 0;
> +}
> +
> +int rm_init(PVRDMADev *dev);
> +void rm_fini(PVRDMADev *dev);
> +
> +int rm_alloc_pd(PVRDMADev *dev, __u32 *pd_handle, __u32 ctx_handle);
> +void rm_dealloc_pd(PVRDMADev *dev, __u32 pd_handle);
> +
> +RmCQ *rm_get_cq(PVRDMADev *dev, __u32 cq_handle);
> +int rm_alloc_cq(PVRDMADev *dev, struct pvrdma_cmd_create_cq *cmd,
> + struct pvrdma_cmd_create_cq_resp *resp);
> +void rm_req_notify_cq(PVRDMADev *dev, __u32 cq_handle, u32 flags);
> +void rm_dealloc_cq(PVRDMADev *dev, __u32 cq_handle);
> +
> +int rm_alloc_mr(PVRDMADev *dev, struct pvrdma_cmd_create_mr *cmd,
> + struct pvrdma_cmd_create_mr_resp *resp);
> +void rm_dealloc_mr(PVRDMADev *dev, __u32 mr_handle);
> +
> +RmQP *rm_get_qp(PVRDMADev *dev, __u32 qp_handle);
> +int rm_alloc_qp(PVRDMADev *dev, struct pvrdma_cmd_create_qp *cmd,
> + struct pvrdma_cmd_create_qp_resp *resp);
> +int rm_modify_qp(PVRDMADev *dev, __u32 qp_handle,
> + struct pvrdma_cmd_modify_qp *modify_qp_args);
> +void rm_dealloc_qp(PVRDMADev *dev, __u32 qp_handle);
> +
> +void *rm_get_wqe_ctx(PVRDMADev *dev, unsigned long wqe_ctx_id);
> +int rm_alloc_wqe_ctx(PVRDMADev *dev, unsigned long *wqe_ctx_id, void *ctx);
> +void rm_dealloc_wqe_ctx(PVRDMADev *dev, unsigned long wqe_ctx_id);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_types.h b/hw/net/pvrdma/pvrdma_types.h
> new file mode 100644
> index 0000000..22a7cde
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_types.h
> @@ -0,0 +1,37 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA interface definitions
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + * Yuval Shaia <address@hidden>
> + * Marcel Apfelbaum <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_TYPES_H
> +#define PVRDMA_TYPES_H
> +
> +/* TDOD: All defs here should be removed !!! */
> +
> +#include <stdint.h>
> +#include <asm-generic/int-ll64.h>
> +
> +typedef unsigned char uint8_t;
> +typedef uint64_t dma_addr_t;
> +
> +typedef uint8_t __u8;
> +typedef uint8_t u8;
> +typedef unsigned short __u16;
> +typedef unsigned short u16;
> +typedef uint64_t u64;
> +typedef uint32_t u32;
> +typedef uint32_t __u32;
> +typedef int32_t __s32;
> +#define __bitwise
> +typedef __u64 __bitwise __be64;
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_utils.c b/hw/net/pvrdma/pvrdma_utils.c
> new file mode 100644
> index 0000000..0f420e2
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_utils.c
> @@ -0,0 +1,36 @@
> +#include <qemu/osdep.h>
> +#include <cpu.h>
> +#include <hw/pci/pci.h>
> +#include <hw/net/pvrdma/pvrdma_utils.h>
> +#include <hw/net/pvrdma/pvrdma.h>
> +
> +void pvrdma_pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len)
> +{
> + pr_dbg("%p\n", buffer);
> + pci_dma_unmap(dev, buffer, len, DMA_DIRECTION_TO_DEVICE, 0);
> +}
> +
> +void *pvrdma_pci_dma_map(PCIDevice *dev, dma_addr_t addr, dma_addr_t plen)
> +{
> + void *p;
> + hwaddr len = plen;
> +
> + if (!addr) {
> + pr_dbg("addr is NULL\n");
> + return NULL;
> + }
> +
> + p = pci_dma_map(dev, addr, &len, DMA_DIRECTION_TO_DEVICE);
> + if (!p) {
> + return NULL;
> + }
> +
> + if (len != plen) {
> + pvrdma_pci_dma_unmap(dev, p, len);
> + return NULL;
> + }
> +
> + pr_dbg("0x%llx -> %p (len=%ld)\n", (long long unsigned int)addr, p, len);
> +
> + return p;
> +}
> diff --git a/hw/net/pvrdma/pvrdma_utils.h b/hw/net/pvrdma/pvrdma_utils.h
> new file mode 100644
> index 0000000..da01967
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_utils.h
> @@ -0,0 +1,49 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA interface definitions
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + * Yuval Shaia <address@hidden>
> + * Marcel Apfelbaum <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_UTILS_H
> +#define PVRDMA_UTILS_H
> +
> +#define pr_info(fmt, ...) \
> + fprintf(stdout, "%s: %-20s (%3d): " fmt, "pvrdma", __func__, __LINE__,\
> + ## __VA_ARGS__)
> +
> +#define pr_err(fmt, ...) \
> + fprintf(stderr, "%s: Error at %-20s (%3d): " fmt, "pvrdma", __func__, \
> + __LINE__, ## __VA_ARGS__)
> +
> +#define DEBUG
> +#ifdef DEBUG
> +#define pr_dbg(fmt, ...) \
> + fprintf(stdout, "%s: %-20s (%3d): " fmt, "pvrdma", __func__, __LINE__,\
> + ## __VA_ARGS__)
> +#else
> +#define pr_dbg(fmt, ...)
> +#endif
> +
> +static inline int roundup_pow_of_two(int x)
> +{
> + x--;
> + x |= (x >> 1);
> + x |= (x >> 2);
> + x |= (x >> 4);
> + x |= (x >> 8);
> + x |= (x >> 16);
> + return x + 1;
> +}
> +
> +void pvrdma_pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len);
> +void *pvrdma_pci_dma_map(PCIDevice *dev, dma_addr_t addr, dma_addr_t plen);
> +
> +#endif
> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
> index d77ca60..a016ad6 100644
> --- a/include/hw/pci/pci_ids.h
> +++ b/include/hw/pci/pci_ids.h
> @@ -167,4 +167,7 @@
> #define PCI_VENDOR_ID_TEWS 0x1498
> #define PCI_DEVICE_ID_TEWS_TPCI200 0x30C8
>
> +#define PCI_VENDOR_ID_VMWARE 0x15ad
> +#define PCI_DEVICE_ID_VMWARE_PVRDMA 0x0820
> +
> #endif
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to address@hidden
> More majordomo info at http://vger.kernel.org/majordomo-info.html
signature.asc
Description: PGP signature