[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Overcommiting cpu results in all vms offline
From: |
Stefan Priebe - Profihost AG |
Subject: |
Re: [Qemu-devel] Overcommiting cpu results in all vms offline |
Date: |
Mon, 17 Sep 2018 11:32:48 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
May be amissing piece:
vm.overcommit_memory=0
Greets,
Stefan
Am 17.09.2018 um 09:00 schrieb Stefan Priebe - Profihost AG:
> Hi,
>
> Am 17.09.2018 um 08:38 schrieb Jack Wang:
>> Stefan Priebe - Profihost AG <address@hidden> 于2018年9月16日周日 下午3:31写道:
>>>
>>> Hello,
>>>
>>> while overcommiting cpu I had several situations where all vms gone offline
>>> while two vms saturated all cores.
>>>
>>> I believed all vms would stay online but would just not be able to use all
>>> their cores?
>>>
>>> My original idea was to automate live migration on high host load to move
>>> vms to another node but that makes only sense if all vms stay online.
>>>
>>> Is this expected? Anything special needed to archive this?
>>>
>>> Greets,
>>> Stefan
>>>
>> Hi, Stefan,
>>
>> Do you have any logs when all VMs go offline?
>> Maybe OOMkiller play a role there?
>
> After reviewing i think this is memory related but OOM did not play a role.
> All kvm processes where spinning trying to get > 100% CPU and i was not
> able to even login to ssh. After 5-10 minutes i was able to login.
>
> There were about 150GB free mem.
>
> Relevant settings (no local storage involved):
> vm.dirty_background_ratio:
> 3
> vm.dirty_ratio:
> 10
> vm.min_free_kbytes:
> 10567004
>
> # cat /sys/kernel/mm/transparent_hugepage/defrag
> always defer [defer+madvise] madvise never
>
> # cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never
>
> After that i had the following traces on the host node:
> https://pastebin.com/raw/0VhyQmAv
>
> Thanks!
>
> Greets,
> Stefan
>