qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [External] Re: [RFC PATCH] migration: reduce time of loading non-ite


From: Peter Xu
Subject: Re: [External] Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate
Date: Wed, 7 Dec 2022 17:08:48 -0500

On Thu, Dec 08, 2022 at 12:07:03AM +0800, Chuang Xu wrote:
> 
> On 2022/12/6 上午12:28, Peter Xu wrote:
> > Chuang,
> > 
> > No worry on the delay; you're faster than when I read yours. :)
> > 
> > On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote:
> > > > As a start, maybe you can try with poison address_space_to_flatview() 
> > > > (by
> > > > e.g. checking the start_pack_mr_change flag and assert it is not set)
> > > > during this process to see whether any call stack can even try to
> > > > dereference a flatview.
> > > > 
> > > > It's just that I didn't figure a good way to "prove" its validity, even 
> > > > if
> > > > I think this is an interesting idea worth thinking to shrink the 
> > > > downtime.
> > > Thanks for your sugguestions!
> > > I used a thread local variable to identify whether the current thread is a
> > > migration thread(main thread of target qemu) and I modified the code of
> > > qemu_coroutine_switch to make sure the thread local variable true only in
> > > process_incoming_migration_co call stack. If the target qemu detects that
> > > start_pack_mr_change is set and address_space_to_flatview() is called in
> > > non-migrating threads or non-migrating coroutine, it will crash.
> > Are you using the thread var just to avoid the assert triggering in the
> > migration thread when commiting memory changes?
> > 
> > I think _maybe_ another cleaner way to sanity check this is directly upon
> > the depth:
> > 
> > static inline FlatView *address_space_to_flatview(AddressSpace *as)
> > {
> >      /*
> >       * Before using any flatview, sanity check we're not during a memory
> >       * region transaction or the map can be invalid.  Note that this can
> >       * also be called during commit phase of memory transaction, but that
> >       * should also only happen when the depth decreases to 0 first.
> >       */
> >      assert(memory_region_transaction_depth == 0);
> >      return qatomic_rcu_read(&as->current_map);
> > }
> > 
> > That should also cover the safe cases of memory transaction commits during
> > migration.
> > 
> Peter, I tried this way and found that the target qemu will crash.
> 
> Here is the gdb backtrace:
> 
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007ff2929d851a in __GI_abort () at abort.c:118
> #2  0x00007ff2929cfe67 in __assert_fail_base (fmt=<optimized out>, 
> assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth == 
> 0", file=file@entry=0x55a32575d9b0 
> "/data00/migration/qemu-5.2.0/include/exec/memory.h",
>     line=line@entry=766, function=function@entry=0x55a32578d6e0 
> <__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:92
> #3  0x00007ff2929cff12 in __GI___assert_fail 
> (assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth == 
> 0", file=file@entry=0x55a32575d9b0 
> "/data00/migration/qemu-5.2.0/include/exec/memory.h", line=line@entry=766,
>     function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> 
> "address_space_to_flatview") at assert.c:101
> #4  0x000055a324b2ed5e in address_space_to_flatview (as=0x55a326132580 
> <address_space_memory>) at 
> /data00/migration/qemu-5.2.0/include/exec/memory.h:766
> #5  0x000055a324e79559 in address_space_to_flatview (as=0x55a326132580 
> <address_space_memory>) at ../softmmu/memory.c:811
> #6  address_space_get_flatview (as=0x55a326132580 <address_space_memory>) at 
> ../softmmu/memory.c:805
> #7  0x000055a324e96474 in address_space_cache_init 
> (cache=cache@entry=0x55a32a4fb000, as=<optimized out>, 
> addr=addr@entry=68404985856, len=len@entry=4096, is_write=false) at 
> ../softmmu/physmem.c:3307
> #8  0x000055a324ea9cba in virtio_init_region_cache (vdev=0x55a32985d9a0, n=0) 
> at ../hw/virtio/virtio.c:185
> #9  0x000055a324eaa615 in virtio_load (vdev=0x55a32985d9a0, f=<optimized 
> out>, version_id=<optimized out>) at ../hw/virtio/virtio.c:3203
> #10 0x000055a324c6ab96 in vmstate_load_state (f=f@entry=0x55a329dc0c00, 
> vmsd=0x55a325fc1a60 <vmstate_virtio_scsi>, opaque=0x55a32985d9a0, 
> version_id=1) at ../migration/vmstate.c:143
> #11 0x000055a324cda138 in vmstate_load (f=0x55a329dc0c00, se=0x55a329941c90) 
> at ../migration/savevm.c:913
> #12 0x000055a324cdda34 in qemu_loadvm_section_start_full (mis=0x55a3284ef9e0, 
> f=0x55a329dc0c00) at ../migration/savevm.c:2741
> #13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00, 
> mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939
> #14 0x000055a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at 
> ../migration/savevm.c:3021
> #15 0x000055a324d14b4e in process_incoming_migration_co (opaque=<optimized 
> out>) at ../migration/migration.c:574
> #16 0x000055a32501ae3b in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at ../util/coroutine-ucontext.c:173
> #17 0x00007ff2929e8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #18 0x00007ffed80dc2a0 in ?? ()
> #19 0x0000000000000000 in ?? ()
> 
> address_space_cache_init() is the only caller of address_space_to_flatview
> I can find in vmstate_load call stack so far. Although I think the mr used
> by address_space_cache_init() won't be affected by the delay of
> memory_region_transaction_commit(), we really need a mechanism to prevent
> the modified mr from being used.
> 
> Maybe we can build a stale list:
> If a subregion is added, add its parent to the stale list(considering that
> new subregion's priority has uncertain effects on flatviews).
> If a subregion is deleted, add itself to the stale list.
> When memory_region_transaction_commit() regenerates flatviews, clear the
> stale list.
> when address_space_translate_internal() is called, check whether the mr
> looked up matches one of mrs(or its child)in the stale list. If yes, a
> crash will be triggered.

I'm not sure that'll work, though.  Consider this graph:

                            A
                           / \
                          B   C
                       (p=1) (p=0)

A,B,C are MRs, B&C are subregions to A.  When B's priority is higher (p=1),
any access to A will go upon B, so far so good.

Then, let's assume D comes under C with even higher priority:

                            A
                           / \
                          B   C
                       (p=1) (p=0)
                              |
                              D
                             (p=2)


Adding C into stale list won't work because when with the old flatview
it'll point to B instead, while B is not in the stale list.  The AS
operation will carry out without noticing it's already wrong.

> 
> There may be many details to consider in this mechanism. Hope you can give
> some suggestions on its feasibility.

For this specific case, I'm wildly thinking whether we can just postpone
the init of the vring cache until migration completes.

One thing to mention from what I read it: we'll need to update all the
caches in virtio_memory_listener_commit() anyway, when the batched commit()
happens when migration completes with your approach, so we'll rebuild the
vring cache once and for all which looks also nice if possible.

There's some details to consider. E.g. the commit() happens only when
memory_region_update_pending==true.  We may want to make sure the cache is
initialized unconditionally, at least.  Not sure whether that's doable,
though.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]