On 2022/12/6 上午12:28, Peter Xu wrote:
Chuang,
No worry on the delay; you're faster than when I read yours. :)
On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote:
As a start, maybe you can try with poison address_space_to_flatview() (by
e.g. checking the start_pack_mr_change flag and assert it is not set)
during this process to see whether any call stack can even try to
dereference a flatview.
It's just that I didn't figure a good way to "prove" its validity, even if
I think this is an interesting idea worth thinking to shrink the downtime.
Thanks for your sugguestions!
I used a thread local variable to identify whether the current thread is a
migration thread(main thread of target qemu) and I modified the code of
qemu_coroutine_switch to make sure the thread local variable true only in
process_incoming_migration_co call stack. If the target qemu detects that
start_pack_mr_change is set and address_space_to_flatview() is called in
non-migrating threads or non-migrating coroutine, it will crash.
Are you using the thread var just to avoid the assert triggering in the
migration thread when commiting memory changes?
I think _maybe_ another cleaner way to sanity check this is directly upon
the depth:
static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
/*
* Before using any flatview, sanity check we're not during a memory
* region transaction or the map can be invalid. Note that this can
* also be called during commit phase of memory transaction, but that
* should also only happen when the depth decreases to 0 first.
*/
assert(memory_region_transaction_depth == 0);
return qatomic_rcu_read(&as->current_map);
}
That should also cover the safe cases of memory transaction commits during
migration.
Peter, I tried this way and found that the target qemu will crash.
Here is the gdb backtrace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ff2929d851a in __GI_abort () at abort.c:118
#2 0x00007ff2929cfe67 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0 "/data00/migration/qemu-5.2.0/include/exec/memory.h",
line=line@entry=766, function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:92
#3 0x00007ff2929cff12 in __GI___assert_fail (assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0 "/data00/migration/qemu-5.2.0/include/exec/memory.h", line=line@entry=766,
function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:101
#4 0x000055a324b2ed5e in address_space_to_flatview (as=0x55a326132580 <address_space_memory>) at /data00/migration/qemu-5.2.0/include/exec/memory.h:766
#5 0x000055a324e79559 in address_space_to_flatview (as=0x55a326132580 <address_space_memory>) at ../softmmu/memory.c:811
#6 address_space_get_flatview (as=0x55a326132580 <address_space_memory>) at ../softmmu/memory.c:805
#7 0x000055a324e96474 in address_space_cache_init (cache=cache@entry=0x55a32a4fb000, as=<optimized out>, addr=addr@entry=68404985856, len=len@entry=4096, is_write=false) at ../softmmu/physmem.c:3307
#8 0x000055a324ea9cba in virtio_init_region_cache (vdev=0x55a32985d9a0, n=0) at ../hw/virtio/virtio.c:185
#9 0x000055a324eaa615 in virtio_load (vdev=0x55a32985d9a0, f=<optimized out>, version_id=<optimized out>) at ../hw/virtio/virtio.c:3203
#10 0x000055a324c6ab96 in vmstate_load_state (f=f@entry=0x55a329dc0c00, vmsd=0x55a325fc1a60 <vmstate_virtio_scsi>, opaque=0x55a32985d9a0, version_id=1) at ../migration/vmstate.c:143
#11 0x000055a324cda138 in vmstate_load (f=0x55a329dc0c00, se=0x55a329941c90) at ../migration/savevm.c:913
#12 0x000055a324cdda34 in qemu_loadvm_section_start_full (mis=0x55a3284ef9e0, f=0x55a329dc0c00) at ../migration/savevm.c:2741
#13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00, mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939
#14 0x000055a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at ../migration/savevm.c:3021
#15 0x000055a324d14b4e in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:574
#16 0x000055a32501ae3b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173
#17 0x00007ff2929e8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x00007ffed80dc2a0 in ?? ()
#19 0x0000000000000000 in ?? ()
address_space_cache_init() is the only caller of address_space_to_flatview
I can find in vmstate_load call stack so far. Although I think the mr used
by address_space_cache_init() won't be affected by the delay of
memory_region_transaction_commit(), we really need a mechanism to prevent
the modified mr from being used.
Maybe we can build a stale list:
If a subregion is added, add its parent to the stale list(considering that
new subregion's priority has uncertain effects on flatviews).
If a subregion is deleted, add itself to the stale list.
When memory_region_transaction_commit() regenerates flatviews, clear the
stale list.
when address_space_translate_internal() is called, check whether the mr
looked up matches one of mrs(or its child)in the stale list. If yes, a
crash will be triggered.