[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] osdep: add a qemu_close_all_open_fd() helper
From: |
Clément Léger |
Subject: |
Re: [PATCH v2] osdep: add a qemu_close_all_open_fd() helper |
Date: |
Tue, 16 Jul 2024 15:42:49 +0200 |
User-agent: |
Mozilla Thunderbird |
On 11/07/2024 20:43, Richard Henderson wrote:
> On 6/18/24 04:17, Clément Léger wrote:
>> Since commit 03e471c41d8b ("qemu_init: increase NOFILE soft limit on
>> POSIX"), the maximum number of file descriptors that can be opened are
>> raised to nofile.rlim_max. On recent debian distro, this yield a maximum
>> of 1073741816 file descriptors. Now, when forking to start
>> qemu-bridge-helper, this actually calls close() on the full possible file
>> descriptor range (more precisely [3 - sysconf(_SC_OPEN_MAX)]) which
>> takes a considerable amount of time. In order to reduce that time,
>> factorize existing code to close all open files descriptors in a new
>> qemu_close_all_open_fd() function. This function uses various methods
>> to close all the open file descriptors ranging from the most efficient
>> one to the least one. It also accepts an ordered array of file
>> descriptors that should not be closed since this is required by the
>> callers that calls it after forking.
>>
>> Signed-off-by: Clément Léger <cleger@rivosinc.com>
>>
>> ----
>>
>> v2:
>> - Factorize async_teardown.c close_fds implementation as well as
>> tap.c ones
>> - Apply checkpatch
>> - v1:
>> https://lore.kernel.org/qemu-devel/20240617162520.4045016-1-cleger@rivosinc.com/
>>
>> ---
>> include/qemu/osdep.h | 8 +++
>> net/tap.c | 31 ++++++-----
>> system/async-teardown.c | 37 +------------
>> util/osdep.c | 115 ++++++++++++++++++++++++++++++++++++++++
>> 4 files changed, 141 insertions(+), 50 deletions(-)
>>
>> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>> index f61edcfdc2..9369a97d3d 100644
>> --- a/include/qemu/osdep.h
>> +++ b/include/qemu/osdep.h
>> @@ -755,6 +755,14 @@ static inline void qemu_reset_optind(void)
>> int qemu_fdatasync(int fd);
>> +/**
>> + * Close all open file descriptors except the ones supplied in the
>> @skip array
>> + *
>> + * @skip: ordered array of distinct file descriptors that should not
>> be closed
>> + * @nskip: number of entries in the @skip array.
>> + */
>> +void qemu_close_all_open_fd(const int *skip, unsigned int nskip);
>> +
>> /**
>> * Sync changes made to the memory mapped file back to the backing
>> * storage. For POSIX compliant systems this will fallback
>> diff --git a/net/tap.c b/net/tap.c
>> index 51f7aec39d..6fc3939078 100644
>> --- a/net/tap.c
>> +++ b/net/tap.c
>> @@ -385,6 +385,21 @@ static TAPState *net_tap_fd_init(NetClientState
>> *peer,
>> return s;
>> }
>> +static void close_all_fds_after_fork(int excluded_fd)
>> +{
>> + const int skip_fd[] = {0, 1, 2, 3, excluded_fd};
>
> 3 should not be included here...
Oh yes, my bad.
>
>> + unsigned int nskip = ARRAY_SIZE(skip_fd);
>> +
>> + /*
>> + * skip_fd must be an ordered array of distinct fds, exclude
>> + * excluded_fd if already included in the [0 - 3] range
>> + */
>> + if (excluded_fd <= 3) {
>
> or here -- stdin is 0, stdout is 1, stderr is 2.
>
> Perhaps we need reminding of this and use the STD*_FILENO names instead
> of raw integer constants.
Indeed, I'll switch to using STD*_FILENO.
>
>
>> @@ -400,13 +415,7 @@ static void launch_script(const char
>> *setup_script, const char *ifname,
>> return;
>> }
>> if (pid == 0) {
>> - int open_max = sysconf(_SC_OPEN_MAX), i;
>> -
>> - for (i = 3; i < open_max; i++) {
>> - if (i != fd) {
>> - close(i);
>> - }
>> - }
>
> Note that the original *does* close 3.
>
>> +#ifdef CONFIG_LINUX
>> +static bool qemu_close_all_open_fd_proc(const int *skip, unsigned int
>> nskip)
>> +{
>> + struct dirent *de;
>> + int fd, dfd;
>> + bool close_fd;
>> + DIR *dir;
>> + int i;
>> +
>> + dir = opendir("/proc/self/fd");
>> + if (!dir) {
>> + /* If /proc is not mounted, there is nothing that can be
>> done. */
>> + return false;
>> + }
>> + /* Avoid closing the directory. */
>> + dfd = dirfd(dir);
>> +
>> + for (de = readdir(dir); de; de = readdir(dir)) {
>> + fd = atoi(de->d_name);
>> + close_fd = true;
>> + if (fd == dfd) {
>> + close_fd = false;
>> + } else {
>> + for (i = 0; i < nskip; i++) {
>
> The skip list is sorted, so you should remember the point of the last
> search and begin from there, and you should not search past fd < skip[i].
readdir values are not ordered so the best I can do is restrict
start/end once found in the fds and bail out when < skip[i] as you
pointed out.
for (i = skip_start; i < skip_end; i++) {
if (fd < skip[i]) {
/* We are below the next skipped fd, break */
break;
} else if (fd == skip[i]) {
close_fd = false;
/* Restrict the range as we found fds matching start/end */
if (i == skip_start)
skip_start++;
else if (i == skip_end)
skip_end--;
break;
}
}
>
>
>> +#else
>> +static bool qemu_close_all_open_fd_proc(const int *skip, unsigned int
>> nskip)
>> +{
>> + return false;
>> +}
>> +#endif
>
> I'm not fond of duplicating the function declaration.
> I think it's better to move the ifdef inside:
>
> static bool foo(...)
> {
> #ifdef XYZ
> impl
> #else
> stub
> #endif
> }
>
Acked.
>> +
>> +#ifdef CONFIG_CLOSE_RANGE
>> +static bool qemu_close_all_open_fd_close_range(const int *skip,
>> + unsigned int nskip)
>> +{
>> + int max_fd = sysconf(_SC_OPEN_MAX) - 1;
>> + int first = 0, last = max_fd;
>> + int cur_skip = 0, ret;
>> +
>> + do {
>> + if (nskip) {
>> + while (first == skip[cur_skip]) {
>> + cur_skip++;
>> + first++;
>> + }
>
> This fails to check cur_skip < nskip in the loop.
> Mixing signed cur_skip with unsigned nskip is bad.
>
> There seems to be no good reason for the separate "if (nskip)" check.
> A proper check for cur_skip < nskip will work just fine with nskip == 0.
I'll try to rework that.
>
>> + /* Fallback */
>> + for (i = 0; i < open_max; i++) {
>> + if (i == skip[cur_skip]) {
>> + cur_skip++;
>> + continue;
>> + }
>> + close(i);
>> + }
>
> Missing nskip test.
I'll fix that.
Thanks,
Clément
>
>
> r~
>