Re: [PATCH V4 02/19] physmem: fd-based shared memory

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V4 02/19] physmem: fd-based shared memory

From:	Steven Sistare
Subject:	Re: [PATCH V4 02/19] physmem: fd-based shared memory
Date:	Thu, 12 Dec 2024 15:38:00 -0500
User-agent:	Mozilla Thunderbird

On 12/9/2024 2:42 PM, Peter Xu wrote:

On Mon, Dec 02, 2024 at 05:19:54AM -0800, Steve Sistare wrote:

@@ -2089,13 +2154,23 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
      new_block->page_size = qemu_real_host_page_size();
      new_block->host = host;
      new_block->flags = ram_flags;
+
+    if (!host && !xen_enabled()) {


Adding one more xen check is unnecessary.  This patch needed it could mean
that the patch can be refactored.. because we have xen checks in both
ram_block_add() and also in the fd allocation path.

At the meantime, see:

qemu_ram_alloc_from_fd():
     if (kvm_enabled() && !kvm_has_sync_mmu()) {
         error_setg(errp,
                    "host lacks kvm mmu notifiers, -mem-path unsupported");
         return NULL;
     }

I don't think any decent kernel could hit this, but that could be another
sign that this patch duplicated some file allocations.

+        if ((new_block->flags & RAM_SHARED) &&
+            !qemu_ram_alloc_shared(new_block, &local_err)) {
+            goto err;
+        }
+    }
+
      ram_block_add(new_block, &local_err);
-    if (local_err) {
-        g_free(new_block);
-        error_propagate(errp, local_err);
-        return NULL;
+    if (!local_err) {
+        return new_block;
      }
-    return new_block;
+
+err:
+    g_free(new_block);
+    error_propagate(errp, local_err);
+    return NULL;
  }


IIUC we only need to conditionally convert an anon-allocation into an
fd-allocation, and then we don't need to mostly duplicate
qemu_ram_alloc_from_fd(), instead we reuse it.

I do have a few other comments elsewhere, but when I was trying to comment.
E.g., we either shouldn't need to bother caching qemu_memfd_check()
results, or do it in qemu_memfd_check() directly.. and some more.


Someone thought it a good idea to cache the result of qemu_memfd_alloc_check,
and qemu_memfd_check will be called more often.  I'll cache the result inside
qemu_memfd_check for the special case of flags=0.

Then I think it's easier I provide a patch, and also show that it can be
also smaller changes to do the same thing, with everything fixed up
(e.g. addressing above mmu notifier missing issue).  What do you think as
below?


The key change you make is calling qemu_ram_alloc_from_fd instead of 
file_ram_alloc,
which buys the xen and kvm checks for free.  Sounds good, I will do that in the
context of my patch.

Here are some other changes in your patch, and my responses:

I will drop the "Retrying using MAP_ANON|MAP_SHARED" message, as you did.

However, I am keeping QEMU_VMALLOC_ALIGN, qemu_set_cloexec, and 
trace_qemu_ram_alloc_shared.

Also, when qemu_memfd_create + qemu_ram_alloc_from_fd fails, qemu should fail 
and exit,
and not fall back, because something unexpected went wrong.  David said the 
same.
Thus we still need to pass errp to qemu_memfd_create().

I will push the qemu_shm_alloc ERRP_GUARD back to patch
  "factor out allocation of anonymous shared memory"

- Steve

===8<===
 From a90119131a972b0b4f15770fe0b431770456e447 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Mon, 9 Dec 2024 13:38:06 -0500
Subject: [PATCH] physmem: Try to always allocate anon and shared memory with
  fd

qemu_ram_alloc_internal() is the memory API QEMU uses to allocate anonymous
memory.  It allows RAM_SHARED too on top of anonymous.

It might be always beneficial to allocate memory with fd attached whenever
possible because fd is normally more flexible comparing to the virtual
mapping alone.  For example, CPR can use it to pass over fds between
processes to share memory, especially useful when the memory can be pinned.

Since there's no harm when it's possible, do it unconditionally for all
such anonymous & shared memory allocations where the memory is to be
allocated.  Provide fallbacks when it can fail, e.g., when none of the
memory attached fd is available.

Two extra ERRP_GUARD()s are needed in the used functions, as we will not
care about error even if it happened, so it's easier to allow passing NULL
into them.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
  system/physmem.c   | 38 ++++++++++++++++++++++++++++++++++++++
  util/memfd.c       |  2 ++
  util/oslib-posix.c |  2 ++
  3 files changed, 42 insertions(+)

diff --git a/system/physmem.c b/system/physmem.c
index dc1db3a384..4e795aefa0 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -47,6 +47,7 @@
  #include "qemu/qemu-print.h"
  #include "qemu/log.h"
  #include "qemu/memalign.h"
+#include "qemu/memfd.h"
  #include "exec/memory.h"
  #include "exec/ioport.h"
  #include "sysemu/dma.h"
@@ -2057,6 +2058,24 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
  }
  #endif

+/*

+ * Try to allocate a zero-sized anonymous fd for shared memory allocations.
+ * Returns >=0 if succeeded, <0 otherwise.
+ *
+ * Prioritize memfd, as it doesn't have the same /dev/shm size limitation
+ * v.s. POSIX shm_open().
+ */
+static int qemu_ram_alloc_anonymous_fd(void)
+{
+    if (qemu_memfd_check(0)) {
+        return qemu_memfd_create("anon-memfd", 0, 0, 0, 0, NULL);
+    } else if (qemu_shm_available()) {
+        return qemu_shm_alloc(0, NULL);
+    } else {
+        return -1;
+    }
+}
+
  static
  RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
                                    void (*resized)(const char*,
@@ -2073,6 +2092,25 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
                            RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
      assert(!host ^ (ram_flags & RAM_PREALLOC));

+ /*

+     * Try to use fd-based allocation for anonymous and shared memory,
+     * because fd is normally more flexible (e.g. on memory sharing between
+     * processes).  We can still fallback to old ways if it fails.
+     */
+    if (!host && (ram_flags & RAM_SHARED)) {
+        int fd = qemu_ram_alloc_anonymous_fd();
+
+        if (fd >= 0) {
+            new_block = qemu_ram_alloc_from_fd(size, mr, ram_flags,
+                                               fd, 0, errp);
+            if (new_block) {
+                return new_block;
+            }
+            close(fd);
+        }
+        /* Either fd or ramblock allocation failed, fallback */
+    }
+
      align = qemu_real_host_page_size();
      align = MAX(align, TARGET_PAGE_SIZE);
      size = ROUND_UP(size, align);
diff --git a/util/memfd.c b/util/memfd.c
index 8a2e906962..0dc15b2f44 100644
--- a/util/memfd.c
+++ b/util/memfd.c
@@ -52,6 +52,8 @@ int qemu_memfd_create(const char *name, size_t size, bool 
hugetlb,
  {
      int htsize = hugetlbsize ? ctz64(hugetlbsize) : 0;

+ ERRP_GUARD();

+
      if (htsize && 1ULL << htsize != hugetlbsize) {
          error_setg(errp, "Hugepage size must be a power of 2");
          return -1;
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f8c3724e68..6ca3e994fc 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -944,6 +944,8 @@ int qemu_shm_alloc(size_t size, Error **errp)
      static int sequence;
      mode_t mode;

+ ERRP_GUARD();

+
      cur_sequence = qatomic_fetch_inc(&sequence);

/*

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH V4 13/19] migration: cpr-transfer save and load, (continued)
- [PATCH V4 13/19] migration: cpr-transfer save and load, Steve Sistare, 2024/12/02
- [PATCH V4 18/19] migration-test: cpr-transfer, Steve Sistare, 2024/12/02
- [PATCH V4 01/19] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd", Steve Sistare, 2024/12/02
  - Re: [PATCH V4 01/19] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd", Peter Xu, 2024/12/09
    - Re: [PATCH V4 01/19] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd", Steven Sistare, 2024/12/12
- [PATCH V4 03/19] memory: add RAM_PRIVATE, Steve Sistare, 2024/12/02
  - Re: [PATCH V4 03/19] memory: add RAM_PRIVATE, Peter Xu, 2024/12/09
- [PATCH V4 05/19] migration: cpr-state, Steve Sistare, 2024/12/02
- [PATCH V4 02/19] physmem: fd-based shared memory, Steve Sistare, 2024/12/02
  - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Peter Xu, 2024/12/09
    - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Steven Sistare <=
    - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Peter Xu, 2024/12/12
    - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Steven Sistare, 2024/12/13
    - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Steven Sistare, 2024/12/13
    - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Peter Xu, 2024/12/16
    - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Steven Sistare, 2024/12/17
    - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Peter Xu, 2024/12/17
    - Re: [PATCH V4 02/19] physmem: fd-based shared memory, Steven Sistare, 2024/12/18
- [PATCH V4 08/19] hostmem-shm: preserve for cpr, Steve Sistare, 2024/12/02
  - Re: [PATCH V4 08/19] hostmem-shm: preserve for cpr, Peter Xu, 2024/12/12
- [PATCH V4 12/19] migration: VMSTATE_FD, Steve Sistare, 2024/12/02

Prev by Date: Re: [PATCH V4 01/19] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd"
Next by Date: [PATCH v3] hw/display: refine upper limit for offset value in assert check
Previous by thread: Re: [PATCH V4 02/19] physmem: fd-based shared memory
Next by thread: Re: [PATCH V4 02/19] physmem: fd-based shared memory
Index(es):
- Date
- Thread