Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

From:	Peter Xu
Subject:	Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate
Date:	Thu, 8 Dec 2022 11:00:56 -0500

On Thu, Dec 08, 2022 at 10:39:11PM +0800, Chuang Xu wrote:
> 
> On 2022/12/8 上午6:08, Peter Xu wrote:
> > On Thu, Dec 08, 2022 at 12:07:03AM +0800, Chuang Xu wrote:
> > > On 2022/12/6 上午12:28, Peter Xu wrote:
> > > > Chuang,
> > > > 
> > > > No worry on the delay; you're faster than when I read yours. :)
> > > > 
> > > > On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote:
> > > > > > As a start, maybe you can try with poison 
> > > > > > address_space_to_flatview() (by
> > > > > > e.g. checking the start_pack_mr_change flag and assert it is not 
> > > > > > set)
> > > > > > during this process to see whether any call stack can even try to
> > > > > > dereference a flatview.
> > > > > > 
> > > > > > It's just that I didn't figure a good way to "prove" its validity, 
> > > > > > even if
> > > > > > I think this is an interesting idea worth thinking to shrink the 
> > > > > > downtime.
> > > > > Thanks for your sugguestions!
> > > > > I used a thread local variable to identify whether the current thread 
> > > > > is a
> > > > > migration thread(main thread of target qemu) and I modified the code 
> > > > > of
> > > > > qemu_coroutine_switch to make sure the thread local variable true 
> > > > > only in
> > > > > process_incoming_migration_co call stack. If the target qemu detects 
> > > > > that
> > > > > start_pack_mr_change is set and address_space_to_flatview() is called 
> > > > > in
> > > > > non-migrating threads or non-migrating coroutine, it will crash.
> > > > Are you using the thread var just to avoid the assert triggering in the
> > > > migration thread when commiting memory changes?
> > > > 
> > > > I think _maybe_ another cleaner way to sanity check this is directly 
> > > > upon
> > > > the depth:
> > > > 
> > > > static inline FlatView *address_space_to_flatview(AddressSpace *as)
> > > > {
> > > >       /*
> > > >        * Before using any flatview, sanity check we're not during a 
> > > > memory
> > > >        * region transaction or the map can be invalid.  Note that this 
> > > > can
> > > >        * also be called during commit phase of memory transaction, but 
> > > > that
> > > >        * should also only happen when the depth decreases to 0 first.
> > > >        */
> > > >       assert(memory_region_transaction_depth == 0);
> > > >       return qatomic_rcu_read(&as->current_map);
> > > > }
> > > > 
> > > > That should also cover the safe cases of memory transaction commits 
> > > > during
> > > > migration.
> > > > 
> > > Peter, I tried this way and found that the target qemu will crash.
> > > 
> > > Here is the gdb backtrace:
> > > 
> > > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> > > #1  0x00007ff2929d851a in __GI_abort () at abort.c:118
> > > #2  0x00007ff2929cfe67 in __assert_fail_base (fmt=<optimized out>, 
> > > assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth 
> > > == 0", file=file@entry=0x55a32575d9b0 
> > > "/data00/migration/qemu-5.2.0/include/exec/memory.h",
> > >      line=line@entry=766, function=function@entry=0x55a32578d6e0 
> > > <__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:92
> > > #3  0x00007ff2929cff12 in __GI___assert_fail 
> > > (assertion=assertion@entry=0x55a32578cdc0 
> > > "memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0 
> > > "/data00/migration/qemu-5.2.0/include/exec/memory.h", line=line@entry=766,
> > >      function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> 
> > > "address_space_to_flatview") at assert.c:101
> > > #4  0x000055a324b2ed5e in address_space_to_flatview (as=0x55a326132580 
> > > <address_space_memory>) at 
> > > /data00/migration/qemu-5.2.0/include/exec/memory.h:766
> > > #5  0x000055a324e79559 in address_space_to_flatview (as=0x55a326132580 
> > > <address_space_memory>) at ../softmmu/memory.c:811
> > > #6  address_space_get_flatview (as=0x55a326132580 <address_space_memory>) 
> > > at ../softmmu/memory.c:805
> > > #7  0x000055a324e96474 in address_space_cache_init 
> > > (cache=cache@entry=0x55a32a4fb000, as=<optimized out>, 
> > > addr=addr@entry=68404985856, len=len@entry=4096, is_write=false) at 
> > > ../softmmu/physmem.c:3307
> > > #8  0x000055a324ea9cba in virtio_init_region_cache (vdev=0x55a32985d9a0, 
> > > n=0) at ../hw/virtio/virtio.c:185
> > > #9  0x000055a324eaa615 in virtio_load (vdev=0x55a32985d9a0, f=<optimized 
> > > out>, version_id=<optimized out>) at ../hw/virtio/virtio.c:3203
> > > #10 0x000055a324c6ab96 in vmstate_load_state (f=f@entry=0x55a329dc0c00, 
> > > vmsd=0x55a325fc1a60 <vmstate_virtio_scsi>, opaque=0x55a32985d9a0, 
> > > version_id=1) at ../migration/vmstate.c:143
> > > #11 0x000055a324cda138 in vmstate_load (f=0x55a329dc0c00, 
> > > se=0x55a329941c90) at ../migration/savevm.c:913
> > > #12 0x000055a324cdda34 in qemu_loadvm_section_start_full 
> > > (mis=0x55a3284ef9e0, f=0x55a329dc0c00) at ../migration/savevm.c:2741
> > > #13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00, 
> > > mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939
> > > #14 0x000055a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at 
> > > ../migration/savevm.c:3021
> > > #15 0x000055a324d14b4e in process_incoming_migration_co 
> > > (opaque=<optimized out>) at ../migration/migration.c:574
> > > #16 0x000055a32501ae3b in coroutine_trampoline (i0=<optimized out>, 
> > > i1=<optimized out>) at ../util/coroutine-ucontext.c:173
> > > #17 0x00007ff2929e8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > > #18 0x00007ffed80dc2a0 in ?? ()
> > > #19 0x0000000000000000 in ?? ()
> > > 
> > > address_space_cache_init() is the only caller of address_space_to_flatview
> > > I can find in vmstate_load call stack so far. Although I think the mr used
> > > by address_space_cache_init() won't be affected by the delay of
> > > memory_region_transaction_commit(), we really need a mechanism to prevent
> > > the modified mr from being used.
> > > 
> > > Maybe we can build a stale list:
> > > If a subregion is added, add its parent to the stale list(considering that
> > > new subregion's priority has uncertain effects on flatviews).
> > > If a subregion is deleted, add itself to the stale list.
> > > When memory_region_transaction_commit() regenerates flatviews, clear the
> > > stale list.
> > > when address_space_translate_internal() is called, check whether the mr
> > > looked up matches one of mrs（or its child）in the stale list. If yes, a
> > > crash will be triggered.
> > I'm not sure that'll work, though.  Consider this graph:
> > 
> >                              A
> >                             / \
> >                            B   C
> >                         (p=1) (p=0)
> > 
> > A,B,C are MRs, B&C are subregions to A.  When B's priority is higher (p=1),
> > any access to A will go upon B, so far so good.
> > 
> > Then, let's assume D comes under C with even higher priority:
> > 
> >                              A
> >                             / \
> >                            B   C
> >                         (p=1) (p=0)
> >                                |
> >                                D
> >                               (p=2)
> > 
> > 
> > Adding C into stale list won't work because when with the old flatview
> > it'll point to B instead, while B is not in the stale list. The AS
> > operation will carry out without noticing it's already wrong.
> 
> Peter, I think our understanding of priority is different.
> 
> In the qemu docs
> (https://qemu.readthedocs.io/en/stable-6.1/devel/memory.html#overlapping-regions-and-priority),
> it says 'Priority values are local to a container, because the priorities of
> two regions are only compared when they are both children of the same 
> container.'
> And as I read in code, when doing render_memory_region() operation on A, qemu
> will firstly insert B's FlatRanges and its children's FlatRanges recursively
> because B's priority is higher than C. After B's FlatRanges and its children's
> FlatRanges are all inserted into flatviews, C's FlatRanges and its children's
> FlatRanges will be inserted into gaps left by B if B and C overlaps.
> 
> So I think adding D as C's subregion has no effect on B in your second case.
> The old FlatRange pointing to B is still effective. C and C'children with 
> lower
> priority than D will be affected, but we have flagged them as stale.
> 
> I hope I have no misunderstanding of the flatview's construction code. If I
> understand wrong, please forgive my ignorance..😭

No I think you're right.. thanks, I should read the code/doc first rather
than trusting myself. :)

But still, the whole point is that the parent may not even be visible to
the flatview, so I still don't know how it could work.

My 2nd attempt:

                                  A
                                  |
                                  B
                                (p=1)

Adding C with p=2:

                                  A
                                 / \
                                B   C
                             (p=1) (p=2)

IIUC the flatview to access the offset A resides should point to B, then
after C plugged we'll still lookup and find B.  Even if A is in the stale
list, B is not?

The other thing I didn't mention is that I don't think the address space
translation is the solo consumer of the flat view.  Some examples:

common_semi_find_bases() walks the flatview without translations.

memory_region_update_coalesced_range() (calls address_space_get_flatview()
first) notifies kvm coalesced mmio regions without translations.

So at least hooking up address_space_translate_internal() itself may not be
enough too.

> 
> > 
> > > There may be many details to consider in this mechanism. Hope you can give
> > > some suggestions on its feasibility.
> > For this specific case, I'm wildly thinking whether we can just postpone
> > the init of the vring cache until migration completes.
> > 
> > One thing to mention from what I read it: we'll need to update all the
> > caches in virtio_memory_listener_commit() anyway, when the batched commit()
> > happens when migration completes with your approach, so we'll rebuild the
> > vring cache once and for all which looks also nice if possible.
> > 
> > There's some details to consider. E.g. the commit() happens only when
> > memory_region_update_pending==true.  We may want to make sure the cache is
> > initialized unconditionally, at least.  Not sure whether that's doable,
> > though.
> > 
> > Thanks,
> > 
> Good idea! We can try it in the new patches! And with the delay of
> virtio_init_region_cache(), we can still use assert in 
> address_space_to_flatview().
> However, I think the stale list can be used as a retention scheme for further
> discussion in the future, because the stale list may adapt to more complex 
> scenarios.

If the assert will work that'll be even better.  I'm actually worried this
can trigger like what you mentioned in the virtio path, I didn't expect it
comes that soon.  So if there's a minimum cases and we can fixup easily
that'll be great.  Hopefully there aren't so much or we'll need to revisit
the whole idea.

Thanks,

-- 
Peter Xu

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate, Chuang Xu, 2022/12/05
- Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate, Peter Xu, 2022/12/05
  - Re: [External] Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate, Chuang Xu, 2022/12/07
    - Re: [External] Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate, Peter Xu, 2022/12/07
    - Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate, Chuang Xu, 2022/12/08
    - Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate, Peter Xu <=
    - Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate, Chuang Xu, 2022/12/09
    - Re: [External] Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate, Chuang Xu, 2022/12/09

Prev by Date: Re: [PATCH-for-8.0 v2 2/4] gdbstub: Use vaddr type for generic insert/remove_breakpoint() API
Next by Date: Re: [SeaBIOS] Re: [PATCH 4/4] be less conservative with the 64bit pci io window
Previous by thread: Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate
Next by thread: Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate
Index(es):
- Date
- Thread