[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Handling large allocations (bypassing mm?)
From: |
Julian Andres Klode |
Subject: |
Handling large allocations (bypassing mm?) |
Date: |
Wed, 14 Dec 2022 14:21:49 +0100 |
Hi,
so I want to bring this discussion here that I had mostly with myself
in the past days on IRC.
As some of you know, we had a couple issues with large initrds in
Ubuntu, Jeremy posted a patch series earlier about mmunlimited.
I wanted to propose a more fine-grained approach, as well as a
more generic approach to handling large allocations.
The first issue one experiences when opening large initrds is
that grub_file_open() calls grub_verifier_open() which simply
grub_malloc()s a buffer for the size of the file.
Later, for initrd, we have to allocate it a second time, in
the upstream tree that happens via relocator, in the rhboot
tree it allocates directly from EFI.
Now my basic proposal is quite simple: We make grub_malloc()
and that relocator allocation code bypass the grub memory
management altogether and just do raw EFI page allocations
(provide two function pointers grub_mm_allocate_pages and
grub_mm_free_pages, and just call them if allocation size
is large[1]). e.g. at the start of grub_malloc:
if (len > @100 pages@ && grub_mm_allocate_pages != NULL) {
ret = grub_mm_allocate_pages_below(@4GB@, ..., ROUND_TO_PAGES(size));
if (ret == NULL)
ret = grub_mm_allocate_pages_below(@infinity@, ...,
ROUND_TO_PAGES(size));
return ret;
}
Allocating those below 4GB and only falling back to >4GB when we
run out of space allows us to avoid most issues where DMA fails
above 4GB.
But then we also patch grub_file_read() to check if the target buffer
is located above 4GB and if so, use bounce buffers to copy
data so that we avoid even more of those issues, so we add to
the start of it something like:
if ((grub_addr_t) buf > @4GB@) {
return read_bufferedfile, buf. len)
}
where grub_file_read_with_buffer is like in rhboot's EFI loader:
#define BOUNCE_BUFFER_MAX 0x1000000ull
static grub_ssize_t
read_buffered(grub_file_t file, grub_uint8_t *bufp, grub_size_t len)
{
grub_ssize_t bufpos = 0;
static grub_size_t bbufsz = 0;
static char *bbuf = NULL;
if (bbufsz == 0)
bbufsz = MIN(BOUNCE_BUFFER_MAX, len);
while (!bbuf && bbufsz)
{
bbuf = grub_malloc(bbufsz);
if (!bbuf)
bbufsz >>= 1;
}
if (!bbuf)
grub_error (GRUB_ERR_OUT_OF_MEMORY, N_("cannot allocate bounce
buffer"));
while (bufpos < (long long)len)
{
grub_ssize_t sz;
sz = grub_file_read (file, bbuf, MIN(bbufsz, len - bufpos));
if (sz < 0)
return sz;
if (sz == 0)
break;
grub_memcpy(bufp + bufpos, bbuf, sz);
bufpos += sz;
}
return bufpos;
}
Now we still end up allocating each file twice, but we allocate
and release the verifier copy to the EFI system. This means that
we allocate a lot less regions and have outsourced the problem
of releasing the memory after it's been used to the firmware :)
Of course ultimately we would want to avoid the double
allocation altogether, so it might make sense to provide
a way to directly allocate the buffer we need, such as:
void * (*grub_allocator)(size_t bytes);
grub_file_t grub_file_open_alloc(const char *name,
enum grub_file_type type,
grub_allocator allocator);
or a function that simply reads a file at a path into a buffer:
void *grub_file_open_read_close(const char *name,
enum grub_file_type type,
grub_allocator allocator);
The latter simply allocates the buffer by calling allocator,
reads into it, then verifies the content using verifier
framework before returning it.
So if we want to load an initrd, we write a function that
allocates an initrd using whatever policies the kernel needs
there, and then do
initrd_buf = grub_file_open_read_close(path,
GRUB_FILE_TYPE_LINUX_INITRD | GRUB_FILE_TYPE_NO_DECOMPRESS,
initrd_alloc);
and then we're done and don't need to allocate and read
each file twice.
But that seems like a 2nd step that's a bit more complex
than bypassing the MM for large allocations and using bounce
buffers for >4GB targets in grub_file_read().
[1] What is large? Perhaps it's just 100 pages, perhaps it's
4 MB. It depends on how different the performance is for
the EFI call round trip vs doing it in our mm.
--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer i speak de, en
signature.asc
Description: PGP signature
- Handling large allocations (bypassing mm?),
Julian Andres Klode <=