qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS


From: Christian Borntraeger
Subject: Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
Date: Mon, 15 Jun 2015 09:01:53 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Am 13.06.2015 um 22:10 schrieb Michael S. Tsirkin:
> On Fri, Jun 12, 2015 at 01:56:37PM +0200, Christian Borntraeger wrote:
>> Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin:
>>> On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote:
>>>> On 09/06/15 13:37, Christian Borntraeger wrote:
>>>>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
>>>>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
>>>>>> when Linux is under severe memory pressure. Various mechanisms are
>>>>>> responsible for correct virtio_balloon memory management. Nevertheless it
>>>>>> is often the case that these control tools does not have enough time to
>>>>>> react on fast changing memory load. As a result OS runs out of memory and
>>>>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon
>>>>>> should not cause the termination of processes while there are pages in 
>>>>>> the
>>>>>> balloon. Now there is no way for virtio balloon driver to free memory at
>>>>>> the last moment before some process get killed by OOM-killer.
>>>>>>
>>>>>> This does not provide a security breach as balloon itself is running
>>>>>> inside Guest OS and is working in the cooperation with the host. Thus
>>>>>> some improvements from Guest side should be considered as normal.
>>>>>>
>>>>>> To solve the problem, introduce a virtio_balloon callback which is
>>>>>> expected to be called from the oom notifier call chain in out_of_memory()
>>>>>> function. If virtio balloon could release some memory, it will make the
>>>>>> system return and retry the allocation that forced the out of memory
>>>>>> killer to run.
>>>>>>
>>>>>> This behavior should be enabled if and only if appropriate feature bit
>>>>>> is set on the device. It is off by default.
>>>>> The balloon frees pages in this way
>>>>>
>>>>> static void balloon_page(void *addr, int deflate)
>>>>> {
>>>>> #if defined(__linux__)
>>>>>     if (!kvm_enabled() || kvm_has_sync_mmu())
>>>>>         qemu_madvise(addr, TARGET_PAGE_SIZE,
>>>>>                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
>>>>> #endif
>>>>> }
>>>>>
>>>>> The guest can re-touch that page and get a empty zero or the old page 
>>>>> back without
>>>>> tampering the host integrity. This should work for all cases I am aware 
>>>>> of (without sync_mmu its a nop anyway) so why not enable that by default? 
>>>>> Anything that I missed?
>>>>>
>>>>> Christian
>>>>
>>>> I'd like to do that :) Actually original version of kernel patch
>>>> has enabled this unconditionally. But Michael asked to make
>>>> it configurable and off by default.
>>>>
>>>> Den
>>>
>>> That's not the question here.  The question is why is it limited by 
>>> kvm_has_sync_mmu.
>>
>> Well we have two interesting options here:
>>
>> VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM
>>
>> For any sane host with ondemand paging just re-accessing the page
>> should simply work. So the common case could be
>> VIRTIO_BALLOON_F_MUST_TELL_HOST == off
> 
> Disabling this breaks useful optimizations such as
> ability not to migrate memory in the balloon.

memory in the balloon is usually backed by the empty zero page after
the madvise (WONT_NEED will finally result in zap_pte_range for the
common case). In a ideal world migration should be able to optimize
zero pages.

 
>> VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on
> 
> AFAIK management tools depend on balloon not deflating
> below host-specified threshold to avoid OOM on the host.
> So I don't think we can make this a default,
> management needs to enable this explicitly.

If the ballooning is required to keep the host memory managedment
from OOM - iow abusing ballooning as memory hotplug between guests
then yes better let the guest oom - that makes sense.

Now: I think that doing so (not having enough swap in the host if
all guests deflate) and relying on balloon semantics is fundamentally
broken. Let me explain this: The problem is that we rely on guest
cooperation for the host integrity. As I explained  using madvise 
WONT_NEED will replace the current PTEs with invalid/emtpy PTEs. As
soon as the guest kernel re-touches the page (e.g. a malicious 
kernel module - not the balloon driver) it will be backed by the VMAs
default method - so usually with a shared R/O copy of the empty
zero page. Write accesses will result in a copy-on-write and allocate
new memory in the host. 
There is nothing we can do in the balloon protocol to protect the host
against malicious guests allocating all the maximum memory.

If you need host integrity against guest memory usage, something like
cgroups_memory or so is probably the only reliable way.

> 
>> Only for the rare case of hypervisors without paging or other memory
>> related restrictions we have to enable MUST_TELL_HOST.
>> Now: QEMU knows exactly which case we have, so why not let QEMU tell
>> the guest what the capabilities are. (e.g. sync_mmu ---> no need to 
>> tell the host).
>>
>> I can at least imaging that some admin wants to make the the oom case
>> configurable, but a sane default seems to be to not kill random
>> guest processes.
>>
>> Christian
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]