qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v11 11/15] rdma: core logic


From: Michael R. Hines
Subject: Re: [Qemu-devel] [PATCH v11 11/15] rdma: core logic
Date: Tue, 25 Jun 2013 14:38:25 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5

On 06/25/2013 12:31 PM, Vasilis Liaskovitis wrote:
Hi,

On Mon, Jun 24, 2013 at 09:58:01PM -0400, address@hidden wrote:
From: "Michael R. Hines" <address@hidden>

[...]
+/*
+ * Put in the log file which RDMA device was opened and the details
+ * associated with that device.
+ */
+static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs)
+{
+    printf("%s RDMA Device opened: kernel name %s "
+           "uverbs device name %s, "
+           "infiniband_verbs class device path %s,"
+           " infiniband class device path %s\n",
+                who,
+                verbs->device->name,
+                verbs->device->dev_name,
+                verbs->device->dev_path,
+                verbs->device->ibdev_path);
+}
see below

+static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
+{
+    int ret = -EINVAL, idx;
+    struct sockaddr_in sin;
+    struct rdma_cm_id *listen_id;
+    char ip[40] = "unknown";
+
+    for (idx = 0; idx < RDMA_CONTROL_MAX_WR; idx++) {
+        rdma->wr_data[idx].control_len = 0;
+        rdma->wr_data[idx].control_curr = NULL;
+    }
+
+    if (rdma->host == NULL) {
+        ERROR(errp, "RDMA host is not set!\n");
+        rdma->error_state = -EINVAL;
+        return -1;
+    }
+    /* create CM channel */
+    rdma->channel = rdma_create_event_channel();
+    if (!rdma->channel) {
+        ERROR(errp, "could not create rdma event channel\n");
+        rdma->error_state = -EINVAL;
+        return -1;
+    }
+
+    /* create CM id */
+    ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
+    if (ret) {
+        ERROR(errp, "could not create cm_id!\n");
+        goto err_dest_init_create_listen_id;
+    }
+
+    memset(&sin, 0, sizeof(sin));
+    sin.sin_family = AF_INET;
+    sin.sin_port = htons(rdma->port);
+
+    if (rdma->host && strcmp("", rdma->host)) {
+        struct hostent *dest_addr;
+        dest_addr = gethostbyname(rdma->host);
+        if (!dest_addr) {
+            ERROR(errp, "migration could not gethostbyname!\n");
+            ret = -EINVAL;
+            goto err_dest_init_bind_addr;
+        }
+        memcpy(&sin.sin_addr.s_addr, dest_addr->h_addr,
+                dest_addr->h_length);
+        inet_ntop(AF_INET, dest_addr->h_addr, ip, sizeof ip);
+    } else {
+        sin.sin_addr.s_addr = INADDR_ANY;
+    }
+
+    DPRINTF("%s => %s\n", rdma->host, ip);
+
+    ret = rdma_bind_addr(listen_id, (struct sockaddr *)&sin);
+    if (ret) {
+        ERROR(errp, "Error: could not rdma_bind_addr!\n");
+        goto err_dest_init_bind_addr;
+    }
+
+    rdma->listen_id = listen_id;
+    if (listen_id->verbs) {
+        rdma->verbs = listen_id->verbs;
+    }
+    qemu_rdma_dump_id("dest_init", rdma->verbs);

I wonder if you have ever hit the case where rdma_bind_addr() does not set the
verbs structure in listen_id because we are binding to the loopback device (also
see linux kernel commit 8523c048).
I keep hitting this case on my destination VM ("incoming x-rdma:host:port)

Then I think qemu_rdma_dump_id can segfault trying to dereference a null verbs
structure. The dump_id function should check for non-NULL verbs argument,
or the dump should be made only in the (verbs != NULL) if clause.

Disabling the dump_id above, I have rdma_resolve_addr() problems on the source
VM side (getting RDMA_CM_EVENT_ADDR_ERROR instead of
RDMA_CM_EVENT_ADDR_RESOLVED).

I assume that is because of the null verbs structure destination problem above.
qemu_rdma_dest_prepare() will always fail with a NULL verbs argument:

Good catch, thank you. I'll fix this immediately in the next version.

I never tried binding to the localhost before......

+
+static int qemu_rdma_dest_prepare(RDMAContext *rdma, Error **errp)
+{
+    int ret;
+    int idx;
+
+    if (!rdma->verbs) {
+        ERROR(errp, "no verbs context!\n");
+        return 0;
+    }
It is first called from rdma_start_incoming_migration() and will fail with the
loopback binding case (rdma->verbs == NULL).

however later qemu_rdma_accept() will check against the incoming cm_event
verbs structure and set the RDMAContext's verb struct, calling
qemu_rdma_dest_prepare with that struct:

[...]
+static int qemu_rdma_accept(RDMAContext *rdma)
[...]
+    if (!rdma->verbs) {
+        rdma->verbs = verbs;
+        /*
+         * Cannot propagate errp, as there is no error pointer
+         * to be propagated.
+         */
+        ret = qemu_rdma_dest_prepare(rdma, NULL);
+        if (ret) {
+            fprintf(stderr, "rdma migration: error preparing dest!\n");
+            goto err_rdma_dest_wait;
+        }
Are these two cases intentionally different?

Another good catch. We should definitely allow loopback RDMA without any issues.

I will fix this immediately as well in the next version.

- Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]