qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] s390/kvm: implement clearing part of IPL cl


From: Christian Borntraeger
Subject: Re: [Qemu-devel] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear
Date: Thu, 1 Mar 2018 13:39:50 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0


On 03/01/2018 01:35 PM, Christian Borntraeger wrote:
> 
> 
> On 03/01/2018 01:28 PM, Dr. David Alan Gilbert wrote:
>> * Christian Borntraeger (address@hidden) wrote:
>>>
>>>
>>> On 03/01/2018 12:45 PM, Dr. David Alan Gilbert wrote:
>>>> * Christian Borntraeger (address@hidden) wrote:
>>>>>
>>>>>
>>>>> On 03/01/2018 10:24 AM, Dr. David Alan Gilbert wrote:
>>>>>> * Thomas Huth (address@hidden) wrote:
>>>>>>> On 28.02.2018 20:53, Christian Borntraeger wrote:
>>>>>>>> When a guests reboots with diagnose 308 subcode 3 it requests the 
>>>>>>>> memory
>>>>>>>> to be cleared. We did not do it so far. This does not only violate the
>>>>>>>> architecture, it also misses the chance to free up that memory on
>>>>>>>> reboot, which would help on host memory over commitment.  By using
>>>>>>>> ram_block_discard_range we can cover both cases.
>>>>>>>
>>>>>>> Sounds like a good idea. I wonder whether that release_all_ram()
>>>>>>> function should maybe rather reside in exec.c, so that other machines
>>>>>>> that want to clear all RAM at reset time can use it, too?
>>>>>>>
>>>>>>>> Signed-off-by: Christian Borntraeger <address@hidden>
>>>>>>>> ---
>>>>>>>>  target/s390x/kvm.c | 19 +++++++++++++++++++
>>>>>>>>  1 file changed, 19 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
>>>>>>>> index 8f3a422288..2e145ad5c3 100644
>>>>>>>> --- a/target/s390x/kvm.c
>>>>>>>> +++ b/target/s390x/kvm.c
>>>>>>>> @@ -34,6 +34,8 @@
>>>>>>>>  #include "qapi/error.h"
>>>>>>>>  #include "qemu/error-report.h"
>>>>>>>>  #include "qemu/timer.h"
>>>>>>>> +#include "qemu/rcu_queue.h"
>>>>>>>> +#include "sysemu/cpus.h"
>>>>>>>>  #include "sysemu/sysemu.h"
>>>>>>>>  #include "sysemu/hw_accel.h"
>>>>>>>>  #include "hw/boards.h"
>>>>>>>> @@ -41,6 +43,7 @@
>>>>>>>>  #include "sysemu/device_tree.h"
>>>>>>>>  #include "exec/gdbstub.h"
>>>>>>>>  #include "exec/address-spaces.h"
>>>>>>>> +#include "exec/ram_addr.h"
>>>>>>>>  #include "trace.h"
>>>>>>>>  #include "qapi-event.h"
>>>>>>>>  #include "hw/s390x/s390-pci-inst.h"
>>>>>>>> @@ -1841,6 +1844,14 @@ static int kvm_arch_handle_debug_exit(S390CPU 
>>>>>>>> *cpu)
>>>>>>>>      return ret;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void release_all_rams(void)
>>>>>>>
>>>>>>> s/rams/ram/ maybe?
>>>>>>>
>>>>>>>> +{
>>>>>>>> +    struct RAMBlock *rb;
>>>>>>>> +
>>>>>>>> +    QLIST_FOREACH_RCU(rb, &ram_list.blocks, next)
>>>>>>>> +        ram_block_discard_range(rb, 0, rb->used_length);
>>>>>>>
>>>>>>> From a coding style point of view, I think there should be curly braces
>>>>>>> around ram_block_discard_range() ?
>>>>>>
>>>>>> I think this might break if it happens during a postcopy migrate.
>>>>>> The destination CPU is running, so it can do a reboot at just the wrong
>>>>>> time; and then the pages (that are protected by userfaultfd) would get
>>>>>> deallocated and trigger userfaultfd requests if accessed.
>>>>>
>>>>> Yes, userfaultd/postcopy is really fragile and relies on things that are 
>>>>> not
>>>>> necessarily true (e.g. virito-balloon can also invalidate pages).
>>>>
>>>> That's why we use qemu_balloon_inhibit around postcopy to stop
>>>> ballooning; I'm not aware of anything else that does the same.
>>>
>>> we also have at least the pte_unused thing in mm/rmap.c that clearly
>>> predates userfaultfd. We might need to look into this as well....
>>
>> I've not come across that; what does that do?
> 
> It can drop a page on page out if the page is no longer of value. It is used 
> by
> the CMMA (guest page hinting) code of s390x.
> 
> see kernel mm/rmap.c
> 
> 
> static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>                      unsigned long address, void *arg)
> {
> [...]
>                 } else if (pte_unused(pteval)) {
>                         /*
>                          * The guest indicated that the page content is of no
>                          * interest anymore. Simply discard the pte, vmscan
>                          * will take care of the rest.
>                          */
>                       dec_mm_counter(mm, mm_counter(page));
>                         /* We have to invalidate as we cleared the pte */
>                         mmu_notifier_invalidate_range(mm, address,
>                                                       address + PAGE_SIZE);
>                 } else if (IS_ENABLED(CONFIG_MIGRATION) &&
>                                 (flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) {
> [...]
> 
> 

Maybe something like this in the kernel

diff --git a/mm/rmap.c b/mm/rmap.c
index 47db27f8049e..9bdf4d448987 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1483,7 +1483,7 @@ static bool try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
                                set_pte_at(mm, address, pvmw.pte, pteval);
                        }
 
-               } else if (pte_unused(pteval)) {
+               } else if (pte_unused(pteval) && !vma->vm_userfaultfd_ctx.ctx) {
                        /*
                         * The guest indicated that the page content is of no
                         * interest anymore. Simply discard the pte, vmscan


could help?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]