On Fri, Dec 06, 2024 at 02:19:06AM +0300, Daniil Tatianin wrote:
Currently, passing mem-lock=on to QEMU causes memory usage to grow by
huge amounts:
no memlock:
$ qemu-system-x86_64 -overcommit mem-lock=off
$ ps -p $(pidof ./qemu-system-x86_64) -o rss=
45652
$ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
$ ps -p $(pidof ./qemu-system-x86_64) -o rss=
39756
memlock:
$ qemu-system-x86_64 -overcommit mem-lock=on
$ ps -p $(pidof ./qemu-system-x86_64) -o rss=
1309876
$ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
$ ps -p $(pidof ./qemu-system-x86_64) -o rss=
259956
This is caused by the fact that mlockall(2) automatically
write-faults every existing and future anonymous mappings in the
process right away.
One of the reasons to enable mem-lock is to protect a QEMU process'
pages from being compacted and migrated by kcompactd (which does so
by messing with a live process page tables causing thousands of TLB
flush IPIs per second) basically stealing all guest time while it's
active.
mem-lock=on helps against this (given compact_unevictable_allowed is 0),
but the memory overhead it introduces is an undesirable side effect,
which we can completely avoid by passing MCL_ONFAULT to mlockall, which
is what this series allows to do with a new command line option called
mem-lock-onfault.
IMHO it'll be always helpful to dig and provide information on why such
difference existed. E.g. guest mem should normally be the major mem sink
and that definitely won't be affected by either ON_FAULT or not.
I had a quick look explicitly on tcg (as that really surprised me a bit..).
When you look at the mappings there's 1G constant shmem map that always got
locked and populated.
It turns out to be tcg's jit buffer, alloc_code_gen_buffer_splitwx_memfd:
buf_rw = qemu_memfd_alloc("tcg-jit", size, 0, &fd, errp);
if (buf_rw == NULL) {
goto fail;
}
buf_rx = mmap(NULL, size, host_prot_read_exec(), MAP_SHARED, fd, 0);
if (buf_rx == MAP_FAILED) {
error_setg_errno(errp, errno,
"failed to map shared memory for execute");
goto fail;
}
Looks like that's the major reason why tcg has mlockall bloated constantly
with roughly 1G size - that seems to be from tcg_init_machine(). I didn't
check kvm.
Logically having a on-fault option won't ever hurt, so probably not an
issue to have it anyway. Still, share my finding above, as IIUC that's
mostly why it was bloated for tcg, so maybe there're other options too.