qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation


From: zhanghailiang
Subject: Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
Date: Mon, 24 Nov 2014 16:25:40 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

On 2014/11/22 2:56, Andrea Arcangeli wrote:
On Fri, Nov 21, 2014 at 11:48:03AM +0800, zhanghailiang wrote:
Hi David,

When i migrated VM in postcopy way when configuring VM with '-realtime 
mlock=on' option,
It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not available: File 
exists" in destination,

Is it a bug of userfaultfd API?

It's not userfaultfd related, but it's remap_anon_pages related (in
the future mcopy_atomic or equivalent userfaultfd cmd) and
MADV_DONTNEED related.

If the destination qemu starts with mlockall(current|future), -EEXIST
saves the day by noticing all not yet transferred pages were already
present in the destination (as allocated zero pages). We can't trigger
non-present faults (in userfaultfd) if the dst starts with mlockall.

Furthermore if precopy has been run before postcopy (currently it's
always the case as there's no way to specify the number of precopy
passes to run before starting postcopy... in turn allowing to specify
zero passes) the bitmap with the re-dirtied pages must be transferred
to the destination before postcopy can start, and MADV_DONTNEED has to
be used to zap those re-dirtied pages. But MADV_DONTNEED will fail
with -EINVAL too well before postcopy starts if mlockall is set on the
destination qemu.

If you didn't fail at -EINVAL in the destination MADV_DONTNEED
probably there wasn't any redirtied page.

remap_anon_pages is extremely strict (unlike vma-mangling mremap that
would just zap the dst range vma silently if it existed) so it cannot
overwrite the guest memory and you get EEXIST (the strictness was
intentional to eliminate the risk of any memory corruption if userland
hits a bug like in this case).

But it should have failed before with MADV_DONTNEED returning -EINVAL
if there was any re-redirted page between the last precopy pass and
postcopy (I assume the guest was idle?).


You are right ;)

In short I think to fix this qemu should call mlockall in the
destination only after postcopy is complete. There's no way to lock
the memory in the destination if the memory still resides in the
source so some userfault may have to happen (and if userfaults happen,
it means we're ot mlocked yet).


Got it, so this problem should be fixed in qemu. Thanks for your explanation.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]