[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] mem-prealloc: reduce large guest start-up an
From: |
Jitendra Kolhe |
Subject: |
Re: [Qemu-devel] [PATCH v2] mem-prealloc: reduce large guest start-up and migration time. |
Date: |
Tue, 14 Feb 2017 11:53:11 +0530 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 |
On 2/13/2017 9:22 PM, Jitendra Kolhe wrote:
> On 2/13/2017 5:34 PM, Igor Mammedov wrote:
>> On Mon, 13 Feb 2017 11:23:17 +0000
>> "Daniel P. Berrange" <address@hidden> wrote:
>>
>>> On Mon, Feb 13, 2017 at 11:45:46AM +0100, Igor Mammedov wrote:
>>>> On Mon, 13 Feb 2017 14:30:56 +0530
>>>> Jitendra Kolhe <address@hidden> wrote:
>>>>
>>>>> Using "-mem-prealloc" option for a large guest leads to higher guest
>>>>> start-up and migration time. This is because with "-mem-prealloc" option
>>>>> qemu tries to map every guest page (create address translations), and
>>>>> make sure the pages are available during runtime. virsh/libvirt by
>>>>> default, seems to use "-mem-prealloc" option in case the guest is
>>>>> configured to use huge pages. The patch tries to map all guest pages
>>>>> simultaneously by spawning multiple threads. Currently limiting the
>>>>> change to QEMU library functions on POSIX compliant host only, as we are
>>>>> not sure if the problem exists on win32. Below are some stats with
>>>>> "-mem-prealloc" option for guest configured to use huge pages.
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> Idle Guest | Start-up time | Migration time
>>>>> ------------------------------------------------------------------------
>>>>> Guest stats with 2M HugePage usage - single threaded (existing code)
>>>>> ------------------------------------------------------------------------
>>>>> 64 Core - 4TB | 54m11.796s | 75m43.843s
>>>>> 64 Core - 1TB | 8m56.576s | 14m29.049s
>>>>> 64 Core - 256GB | 2m11.245s | 3m26.598s
>>>>> ------------------------------------------------------------------------
>>>>> Guest stats with 2M HugePage usage - map guest pages using 8 threads
>>>>> ------------------------------------------------------------------------
>>>>> 64 Core - 4TB | 5m1.027s | 34m10.565s
>>>>> 64 Core - 1TB | 1m10.366s | 8m28.188s
>>>>> 64 Core - 256GB | 0m19.040s | 2m10.148s
>>>>> -----------------------------------------------------------------------
>>>>> Guest stats with 2M HugePage usage - map guest pages using 16 threads
>>>>> -----------------------------------------------------------------------
>>>>> 64 Core - 4TB | 1m58.970s | 31m43.400s
>>>>> 64 Core - 1TB | 0m39.885s | 7m55.289s
>>>>> 64 Core - 256GB | 0m11.960s | 2m0.135s
>>>>> -----------------------------------------------------------------------
>>>>>
>>>>> Changed in v2:
>>>>> - modify number of memset threads spawned to min(smp_cpus, 16).
>>>>> - removed 64GB memory restriction for spawning memset threads.
>>>>>
>>>>> Signed-off-by: Jitendra Kolhe <address@hidden>
>>>>> ---
>>>>> backends/hostmem.c | 4 ++--
>>>>> exec.c | 2 +-
>>>>> include/qemu/osdep.h | 3 ++-
>>>>> util/oslib-posix.c | 68
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++-----
>>>>> util/oslib-win32.c | 3 ++-
>>>>> 5 files changed, 69 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>>>>> index 7f5de70..162c218 100644
>>>>> --- a/backends/hostmem.c
>>>>> +++ b/backends/hostmem.c
>>>>> @@ -224,7 +224,7 @@ static void host_memory_backend_set_prealloc(Object
>>>>> *obj, bool value,
>>>>> void *ptr = memory_region_get_ram_ptr(&backend->mr);
>>>>> uint64_t sz = memory_region_size(&backend->mr);
>>>>>
>>>>> - os_mem_prealloc(fd, ptr, sz, &local_err);
>>>>> + os_mem_prealloc(fd, ptr, sz, smp_cpus, &local_err);
>>>>> if (local_err) {
>>>>> error_propagate(errp, local_err);
>>>>> return;
>>>>> @@ -328,7 +328,7 @@ host_memory_backend_memory_complete(UserCreatable
>>>>> *uc, Error **errp)
>>>>> */
>>>>> if (backend->prealloc) {
>>>>> os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz,
>>>>> - &local_err);
>>>>> + smp_cpus, &local_err);
>>>>> if (local_err) {
>>>>> goto out;
>>>>> }
>>>>> diff --git a/exec.c b/exec.c
>>>>> index 8b9ed73..53afcd2 100644
>>>>> --- a/exec.c
>>>>> +++ b/exec.c
>>>>> @@ -1379,7 +1379,7 @@ static void *file_ram_alloc(RAMBlock *block,
>>>>> }
>>>>>
>>>>> if (mem_prealloc) {
>>>>> - os_mem_prealloc(fd, area, memory, errp);
>>>>> + os_mem_prealloc(fd, area, memory, smp_cpus, errp);
>>>>> if (errp && *errp) {
>>>>> goto error;
>>>>> }
>>>>> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>>>>> index 56c9e22..fb1d22b 100644
>>>>> --- a/include/qemu/osdep.h
>>>>> +++ b/include/qemu/osdep.h
>>>>> @@ -401,7 +401,8 @@ unsigned long qemu_getauxval(unsigned long type);
>>>>>
>>>>> void qemu_set_tty_echo(int fd, bool echo);
>>>>>
>>>>> -void os_mem_prealloc(int fd, char *area, size_t sz, Error **errp);
>>>>> +void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus,
>>>>> + Error **errp);
>>>>>
>>>>> int qemu_read_password(char *buf, int buf_size);
>>>>>
>>>>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>>>>> index f631464..17da029 100644
>>>>> --- a/util/oslib-posix.c
>>>>> +++ b/util/oslib-posix.c
>>>>> @@ -55,6 +55,16 @@
>>>>> #include "qemu/error-report.h"
>>>>> #endif
>>>>>
>>>>> +#define MAX_MEM_PREALLOC_THREAD_COUNT 16
>>>> running with -smp 16 or bigger on host with less than 16 cpus
>>>> it would be not quite optimal.
>>>> Why not to change MAX_MEM_PREALLOC_THREAD_COUNT constant to
>>>> something like sysconf(_SC_NPROCESSORS_ONLN)
>>>
>>> The point is to not consume more host resources than would otherwise
>>> be consumed by running the guest CPUs. ie, if running a KVM guest
>>> with -smp 4 on a 16 CPU host, QEMU should not to consume more than
>>> 4 pCPUs worth of resource on the host. Using sysconf would cause
>>> the consume to consume all host resources, likely harming other
>>> guests workloads.
>>>
>>> If the person launching QEMU gives a -smp value that's larger than
>>> the host CPUs count, then they've already accepted that they're
>>> asking QEMU todo more than the host is really capable of. IOW, I
>>> don't think we need to special case memsetting for that, since
>>> VCPU execution itself is already going to overcommit the host.
>> Doing over commit at preallocate time doesn't make much sense,
>> if MAX_MEM_PREALLOC_THREAD_COUNT is replaced with
>> sysconf(_SC_NPROCESSORS_ONLN)
>> then QEMU will end up with MIN(-smp, sysconf(_SC_NPROCESSORS_ONLN))
>> which will put cap on upper value and avoid useless over commit at
>> preallocate time.
>>
>
> I agree, we should consider case where we run with -smp >= 16 which
> is overcommited on host with < 16 cpus. At the same time we should
> also be sure that we don't end up spawning to many memset threads.
> For e.g. I have been running fat guests with -smp > 64 on hosts
> with 384 cpus.
>
how about putting a cap on MAX_MEM_PREALLOC_THREAD_COUNT to
(MIN(sysconf(_SC_NPROCESSORS_ONLN), 16))?
Number of memset threads can be calculated using
MIN(smp_cpus, MAX_MEM_PREALLOC_THREAD_COUNT);
Thanks,
- Jitendra
> Thanks,
> - Jitendra
>
>>> Regards,
>>> Daniel
>>
>