Re: [Qemu-devel] [PATCHv2] NVMe: Initial commit for NVM Express device

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCHv2] NVMe: Initial commit for NVM Express device

From:	Kevin Wolf
Subject:	Re: [Qemu-devel] [PATCHv2] NVMe: Initial commit for NVM Express device
Date:	Thu, 21 Feb 2013 12:15:09 +0100
User-agent:	Mutt/1.5.21 (2010-09-15)
On Fri, Dec 14, 2012 at 05:44:37PM -0700, Keith Busch wrote:
> NVM Express is an open standard for PCI-e attached Non-Volatile Memory
> storage. This commit adds an emulated device that supports the register
> interface and command set defined by this standard. The standard can be
> viewed at nvmexpress.org.
> 
> Cc: Michael S. Tsirkin <address@hidden>
> Cc: Keith Busch <address@hidden>
> Signed-off-by: Keith Busch <address@hidden>
> ---
> Thanks to everyone I've received comments from first attempt. Apparently
> there are _rules_ my emulated device needs to follow in order to run
> correctly in this environment. :)
> 
> The most major change is none of the routines run in their own threads
> anymore. They are all timer callbacks that take the BQL. This single
> threads processes I was trying to parallelize before, so some of the code
> is simpler, albiet runs a little slower. The device uses the qemu block
> interface to communicate with the backing storage that represents the
> non-volatile memory of the nvme device. 
> 
> I split the code into a header and c file. I know this is fairly large, but
> I don't see a nice way to significantly split this into multiple commits
> without crippling the functionality. There are some optional features here
> that could potenetially be removed from the initial commmit, but they don't
> take an appreciable amount of code space.
> 
> A lot had to change from the first revision, so my apologies if I missed or
> misunderstood something from the previous comments.

First of all, sorry for not commenting on this patch earlier. I was
planning to have a look since I saw the mail in early January, but I
never actually got around to doing it.

The patch didn't even compile since I first applied to my working copy;
I suspect the include file reorganisation is to blame, so this should
be enough to fix.

I didn't review much of it in great detail, but I have taken a look at
the parts interfacing with the block layer. I'll add some comments
inline:


> +#define min(x, y) ((x) < (y) ? (x) : (y))

qemu-common.h already has MIN().

> +static uint16_t nvme_map_prp(NvmeRequest *req, uint64_t prp1, uint64_t prp2,
> +    uint32_t len, int data_dir, NvmeCtrl *n)
> +{
> +    hwaddr req_len, trans_len = n->page_size - (prp1 % n->page_size);
> +    req_len = trans_len = min(len, trans_len);
> +    int num_prps = (len >> n->page_bits) + 1;
> +    void *addr;
> +
> +    if (!prp1) {
> +        NVME_LOG(ERR, "null prp1");
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    NVME_LOG(IO_DBG,
> +        "ctrl:%u page size:%u prp1:%"PRIx64" prp2:%"PRIx64" len:%u dir:%d",
> +        n->instance, n->page_size, prp1, prp2, len, data_dir);
> +
> +    qemu_iovec_init(&req->qiov, num_prps);
> +    addr = cpu_physical_memory_map(prp1, &trans_len, data_dir);
> +    if (!addr || req_len != trans_len) {
> +        NVME_LOG(ERR,
> +            "unable to map data bytes:%"PRIu64" from address:%"PRIx64"",
> +            trans_len, prp1);
> +        return NVME_INTERNAL_DEV_ERROR;
> +    }
> +    NVME_LOG(IO_DBG, "mapped prp1:%"PRIx64" to %p len:%"PRIu64"", prp1, addr,
> +        trans_len);
> +    qemu_iovec_add(&req->qiov, addr, trans_len);
> +
> +    len -= trans_len;
> +    if (len) {
> +        if (!prp2) {
> +            NVME_LOG(ERR, "null prp2");
> +            return NVME_INVALID_FIELD | NVME_DNR;
> +        }
> +        if (len > n->page_size) {
> +            uint64_t prp_list[n->max_prp_ents], nents, prp_trans;
> +            int i = 0;
> +
> +            nents = (uint64_t)((len + n->page_size - 1) >> n->page_bits);
> +            prp_trans = min(n->max_prp_ents, nents) * sizeof(uint64_t);
> +            cpu_physical_memory_rw(prp2, (uint8_t *)prp_list, prp_trans, 0);
> +
> +            while (len != 0) {
> +                if (i == n->max_prp_ents - 1 && len > n->page_size) {
> +                    if (!prp_list[i] || prp_list[i] & (n->page_size - 1)) {
> +                        NVME_LOG(ERR,
> +                            "null or unaligned prp chain:%u entry 
> %"PRIx64"", i,
> +                            prp_list[i]);
> +                        return NVME_INVALID_FIELD | NVME_DNR;
> +                    }
> +                    nents = (uint64_t)((len + n->page_size - 1) >>
> +                        n->page_bits);
> +                    prp_trans = min(n->max_prp_ents, nents) * 
> sizeof(uint64_t);
> +                    cpu_physical_memory_rw(prp_list[i], (uint8_t *)prp_list,
> +                        prp_trans, 0);
> +                    i = 0;
> +                }
> +                if (!prp_list[i] || prp_list[i] & (n->page_size - 1)) {
> +                    NVME_LOG(ERR,
> +                        "null or unaligned prp list:%u entry %"PRIx64"",
> +                        i, prp_list[i]);
> +                    return NVME_INVALID_FIELD | NVME_DNR;
> +                }
> +
> +                req_len = trans_len = min(len, n->page_size);
> +                addr = cpu_physical_memory_map(prp_list[i], &trans_len,
> +                    data_dir);
> +                if (!addr || req_len != trans_len) {
> +                    NVME_LOG(ERR,
> +                        "unable to map addr%"PRIu64" bytes:%"PRIx64"",
> +                        prp_list[i], trans_len);
> +                    return NVME_INTERNAL_DEV_ERROR;
> +                }
> +                NVME_LOG(IO_DBG, "mapped prp[%u]:%"PRIx64" to %p 
> len:%"PRIu64"",
> +                    i, prp_list[i], addr, trans_len);
> +                qemu_iovec_add(&req->qiov, addr, trans_len);
> +
> +                len -= trans_len;
> +                i++;
> +            }
> +        } else {
> +            if (prp2 & (n->page_size - 1)) {
> +                NVME_LOG(ERR, "prp2 alignment");
> +                return NVME_INVALID_FIELD | NVME_DNR;
> +            }
> +            req_len = trans_len = len;
> +            addr = cpu_physical_memory_map(prp2, &trans_len, data_dir);
> +            if (!addr || req_len != trans_len) {
> +                NVME_LOG(ERR,
> +                    "unable to map data bytes:%"PRIu64" from 
> address:%"PRIx64"",
> +                    trans_len, prp2);
> +                return NVME_INTERNAL_DEV_ERROR;
> +            }
> +            NVME_LOG(IO_DBG, "mapped prp2:%"PRIx64" to %p len:%"PRIu64"", 
> prp2,
> +                addr, trans_len);
> +            qemu_iovec_add(&req->qiov, addr, trans_len);
> +        }
> +    }
> +    return NVME_SUCCESS;
> +}

If I understand right, this maps the guest memory so that it can be used
for passing it to the block layer. Is there any additional functionality
compared to dma-helpers.c here that meant that you had to reimplement
this, or is it just that you didn't know the DMA helpers?

They also take care of cases where you can't map the whole area at once,
in which case you seem to error out.


> +static int nvme_init(PCIDevice *pci_dev)
> +{
> +    NvmeCtrl *n = DO_UPCAST(NvmeCtrl, dev, pci_dev);
> +    uint8_t *pci_conf;
> +
> +    NVME_LOG(DBG, "new controller B:D.f: %02x:%02x.%u",
> +        pci_bus_num(pci_dev->bus), PCI_SLOT(pci_dev->devfn),
> +        PCI_FUNC(pci_dev->devfn));
> +    if (!n->conf.bs) {
> +        NVME_LOG(ERR, "drive property not set");
> +        return -1;
> +    }
> +    if (n->num_namespaces == 0 || n->num_namespaces > 
> NVME_MAX_NUM_NAMESPACES) {
> +        NVME_LOG(ERR, "requested invalid number of namespace:%u max:%u",
> +            n->num_namespaces, NVME_MAX_NUM_NAMESPACES);
> +        return -1;
> +    }
> +    if (n->num_queues < 1 || n->num_queues > NVME_MAX_QS) {
> +        NVME_LOG(ERR, "requested invalid number of queues:%u max:%u",
> +            n->num_queues, NVME_MAX_QS);
> +        return -1;
> +    }
> +    if (n->db_stride > NVME_MAX_STRIDE) {
> +        NVME_LOG(ERR, "requested invalid stride:%u max:%u",
> +            n->db_stride, NVME_MAX_STRIDE);
> +        return -1;
> +    }
> +    if (n->max_q_ents < 1 || n->max_q_ents >
> +            NVME_MAX_QUEUE_ENTRIES) {
> +        NVME_LOG(ERR, "requested invalid queue entries:%u, max:%u",
> +            n->max_q_ents, NVME_MAX_QUEUE_ENTRIES);
> +        return -1;
> +    }
> +    if (n->cqr > 1) {
> +        NVME_LOG(ERR,
> +            "requested invalid contiguous regions requeired:%u max:%u",
> +            n->cqr, 1);
> +        return -1;
> +    }
> +
> +    n->reg_size = 1 << qemu_fls(0x1004 + 2 * (n->num_queues + 1) *
> +        (4 << n->db_stride));
> +    n->ns_size = ((bdrv_getlength(n->conf.bs)) / BYTES_PER_MB) /
> +        n->num_namespaces;

bdrv_getlength() can fail, so you should check for errors before using
the return value.

Not a whole lot of comments, but I hope better than nothing.

Kevin
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [PATCHv2] NVMe: Initial commit for NVM Express device, Kevin Wolf <=
Prev by Date: Re: [Qemu-devel] [PATCH 4/4] piix_pci: Implement reset for i440FX PAM configuration
Next by Date: Re: [Qemu-devel] scp during migration with vhost fails
Previous by thread: [Qemu-devel] [PULL 00/14] usb patch queue
Next by thread: [Qemu-devel] [PATCH v5 0/6] Efficient VM backup for qemu
Index(es):
- Date
- Thread