qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero prec


From: Andrea Arcangeli
Subject: Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
Date: Wed, 26 Apr 2017 21:37:43 +0200
User-agent: Mutt/1.8.2 (2017-04-18)

Hello,

On Wed, Apr 26, 2017 at 08:04:43PM +0100, Dr. David Alan Gilbert wrote:
> * Christian Borntraeger (address@hidden) wrote:
> > On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <address@hidden>
> > > 
> > > When an all-zero page is received during the precopy
> > > phase of a postcopy-enabled migration we must force
> > > allocation otherwise accesses to the page will still
> > > get blocked by userfault.
> > > 
> > > Symptom:
> > >   a) If the page is accessed by a device during device-load
> > >     then we get a deadlock as the source finishes sending
> > >     all its pages but the destination device-load is still
> > >     paused and so doesn't clean up.
> > > 
> > >   b) If the page is accessed later, then the thread will stay
> > >     paused until the end of migration rather than carrying on
> > >     running, until we release userfault at the end.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> > > Reported-by: Christian Borntraeger <address@hidden>
> > 
> > CC stable? after all the guest hangs on both sides
> > 
> > Has survived 40 migrations (usually failed at the 2nd)
> > Tested-by: Christian Borntraeger <address@hidden>
> 
> Great...but.....
> Andrea (added to the mail) says this shouldn't be necessary.
> The read we were doing in the is_zero_range() should have been sufficient
> to get the page mapped and that zero page should have survived.
> 
> So - I guess that's back a step, we need to figure out why the
> page disapepars for you.

Yes reading during precopy is enough to fill the hole and prevent
userfault missing faults to trigger.

Somehow the pagetable must be mapped by a zeropage or a hugezeropage
or a regular page allocated during a previous precopy pass or a
pre-zeroed subpage part of a THP.

Even if the hugezeropage is splitted later by a MADV_DONTNEED with
postcopy starts, they will become 4k zeropages.

After a read succeeds, nothing (except MADV_DONTNEED or other explicit
syscalls which qemu would need to invoke explicitly between
is_zero_range and UFFDIO_REGISTER) should be able to bring the
pagetable back to its "pte_none/pmd_none" state that will then trigger
missing userfaults during postcopy later.

Thanks,
Andrea



reply via email to

[Prev in Thread] Current Thread [Next in Thread]