qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings


From: David Hildenbrand
Subject: Re: [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
Date: Tue, 25 Feb 2020 08:44:56 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0

On 24.02.20 23:49, Peter Xu wrote:
> On Fri, Feb 21, 2020 at 05:42:04PM +0100, David Hildenbrand wrote:
>> When we partially change mappings (esp., mmap over parts of an existing
>> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
>> registered, the handler will implicitly be unregistered from the parts that
>> changed.
>>
>> Trying to place pages onto mappings where there is no longer a handler
>> registered will fail. Let's make sure that any waiter is woken up - we
>> have to do that manually.
>>
>> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
>>
>> This is mainly a preparation for RAM blocks with resizable allcoations,
>> where the mapping of the invalid RAM range will change. The source will
>> keep sending pages that are outside of the new (shrunk) RAM size. We have
>> to treat these pages like they would have been migrated, but can
>> essentially simply drop the content (ignore the placement error).
>>
>> Keep printing a warning on EINVAL, to avoid hiding other (programming)
>> issues. ENOENT is unique.
>>
>> Cc: "Dr. David Alan Gilbert" <address@hidden>
>> Cc: Juan Quintela <address@hidden>
>> Cc: Peter Xu <address@hidden>
>> Cc: Andrea Arcangeli <address@hidden>
>> Signed-off-by: David Hildenbrand <address@hidden>
>> ---
>>  migration/postcopy-ram.c | 37 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 37 insertions(+)
>>
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index c68caf4e42..f023830b9a 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
>>      range_struct.start = (uintptr_t)host_addr;
>>      range_struct.len = length;
>>  
>> +    /*
>> +     * In case the mapping was partially changed since we enabled userfault
>> +     * (e.g., via qemu_ram_remap()), the userfaultfd handler was already 
>> removed
>> +     * for the mappings that changed. Unregistering will, however, still 
>> work
>> +     * and ignore mappings without a registered handler.
>> +     */
> 
> Ideally we should still only unregister what we have registered.
> After all we do have this information because we know what we
> registered, we know what has unmapped (in your new resize() hook, when
> postcopy_state==RUNNING).

Not in the case of qemu_ram_remap(). And whatever you propose will
require synchronization (see my other mail) and more complicated
handling than this. uffd allows you to handle races with mmap changes in
a very elegant way (e.g., -ENOENT, or unregisterignoring changed mappings).

> 
> An extreme example is when we register with pages in range [A, B),
> then shrink it to [A, C), then we mapped something else within [C, B)
> (note, with virtio-mem logically B can be very big and C can be very
> small, it means [B, C) can cover quite some address space). Then if:
> 
>   - [C, B) memory type is not compatible with uffd, or

That will never happen in the near future. Without resizable allocations:
- All memory is either anonymous or from a single fd

In addition, right now, only anonymous memory can be used for resizable
RAM. However, with resizable allocations we could have:
- All used_length memory is either anonymous or from a single fd
- All remaining memory is either anonymous or from a single fd

Everything else does not make any sense IMHO and I don't think this is
relevant long term. You cannot arbitrarily map things into the
used_length part of a RAMBlock. That would contradict to its page_size
and its fd. E.g., you would break qemu_ram_remap().

> 
>   - [C, B) could be registered with uffd again due to some other
>     reason (so far QEMU should not have such a reason)

Any code that wants to make use of uffd properly has to synchronize
against postcopy code either way IMHO. It just doesn't work otherwise.

E.g., once I would use it to protect unplugged memory in virtio-mem
(something I am looking into right now and teaching QEMU not to touch
all RAMBlock memory is complicated :) ), virtio-mem would unregister any
uffd handler when notified that postcopy will start, and re-register
after postcopy finished.

[...]


>>  
>> +static int qemu_ufd_wake_ioctl(int userfault_fd, void *host_addr,
>> +                               uint64_t pagesize)
>> +{
>> +    struct uffdio_range range = {
>> +        .start = (uint64_t)(uintptr_t)host_addr,
>> +        .len = pagesize,
>> +    };
>> +
>> +    return ioctl(userfault_fd, UFFDIO_WAKE, &range);
>> +}
>> +
>>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>                                 void *from_addr, uint64_t pagesize, RAMBlock 
>> *rb)
>>  {
>> @@ -1198,6 +1215,26 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void 
>> *host_addr,
>>          zero_struct.mode = 0;
>>          ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>      }
>> +
>> +    /*
>> +     * When the mapping gets partially changed (e.g., qemu_ram_remap()) 
>> before
>> +     * we try to place a page, the userfaultfd handler will be removed for 
>> the
>> +     * changed mappings and placing pages will fail. We can safely ignore 
>> this,
>> +     * because mappings that changed on the destination don't need data 
>> from the
>> +     * source (e.g., qemu_ram_remap()). Wake up any waiter waiting for that 
>> page
>> +     * (unlikely but possible). Waking up waiters is always possible, even
>> +     * without a registered userfaultfd handler.
>> +     *
>> +     * Old kernels report EINVAL, new kernels report ENOENT in case there is
>> +     * no longer a userfaultfd handler for a mapping.
>> +     */
>> +    if (ret && (errno == ENOENT || errno == EINVAL)) {
>> +        if (errno == EINVAL) {
>> +            warn_report("%s: Failed to place page %p. Waking up any 
>> waiters.",
>> +                         __func__, host_addr);
>> +        }
>> +        ret = qemu_ufd_wake_ioctl(userfault_fd, host_addr, pagesize);
> 
> ... if with above information (takes notes on where we registered
> uffd), I think we don't need to capture error, but we can simply skip
> those outliers.

I think we could skip them in general. Nobody should be touching that
memory. But debugging e.g., a SEGFAULT is easier than debugging some
sleeping thread IMHO.

-- 
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]