[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v3 0/5] migrations: Fix potential rare race of migration-test aft
|
From: |
Peter Xu |
|
Subject: |
[PATCH v3 0/5] migrations: Fix potential rare race of migration-test after yank |
|
Date: |
Thu, 22 Jul 2021 13:58:36 -0400 |
v3:
- Use WITH_QEMU_LOCK_GUARD() for patch 2 [Eric]
(potentially I can also replace other existing uses of qemu_file_lock into
WITH_QEMU_LOCK_GUARD, but I decided to took Dave's r-b first and leave that
for later)
- Added r-bs for Dave on patch 2/4
- Add a comment in patch 5 to explain why it's safe to unregister yank without
qemu_file_lock, in postcopy_pause() [Dave]
v2:
- Pick r-b for Dave on patch 1/3
- Move migration_file_get_ioc() from patch 5 to patch 4, meanwhile rename it to
qemu_file_get_ioc(). [Dave]
- Patch 2 "migration: Shutdown src in await_return_path_close_on_source()" is
replaced by patch "migration: Make from_dst_file accesses thread-safe" [Dave]
Patch 1 fixes a possible race that migration thread can accidentally skip
join() of rp_thread even if the return thread is enabled. Patch 1 is suspected
to also be the root cause of the recent hard-to-reproduce migration-test
failure here reported by PMM:
https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/
I didn't reproduce it myself; but after co-debugged with Dave it's suspected
that the race of rp_thread could be the cause. It's not exposed before because
yank is soo strict on releasing instances, while we're not that strict before,
and didn't join() on rp_thread wasn't so dangerous after all when migration
succeeded before.
Patch 2 fixes another theoretical race on accessing from_dst_file spotted by
Dave. I don't think there's known issues with it, but may still worth fixing.
Patch 3 should be a cleanup on yank that I think would be nice to have.
Patch 4-5 are further cleanups to remove the ref==1 check in channel_close(),
finally, as I always thought that's a bit hackish. So I used explicit
unregister of the yank function at necessary places to replace that ref==1 one.
I still think having patch 3-5 altogether would be great, however I think patch
1 should still be the most important to be reviewed. Also it would be great to
know whether patch 1 could fix the known yank crash.
Please review, thanks.
Peter Xu (5):
migration: Fix missing join() of rp_thread
migration: Make from_dst_file accesses thread-safe
migration: Introduce migration_ioc_[un]register_yank()
migration: Teach QEMUFile to be QIOChannel-aware
migration: Move the yank unregister of channel_close out
migration/channel.c | 15 ++-------
migration/migration.c | 57 +++++++++++++++++++++++++++--------
migration/migration.h | 15 +++++++--
migration/multifd.c | 8 ++---
migration/qemu-file-channel.c | 11 ++-----
migration/qemu-file.c | 17 ++++++++++-
migration/qemu-file.h | 4 ++-
migration/ram.c | 3 +-
migration/savevm.c | 11 +++++--
migration/yank_functions.c | 42 ++++++++++++++++++++++++++
migration/yank_functions.h | 3 ++
11 files changed, 138 insertions(+), 48 deletions(-)
--
2.31.1
- [PATCH v3 0/5] migrations: Fix potential rare race of migration-test after yank,
Peter Xu <=
- [PATCH v3 1/5] migration: Fix missing join() of rp_thread, Peter Xu, 2021/07/22
- [PATCH v3 2/5] migration: Make from_dst_file accesses thread-safe, Peter Xu, 2021/07/22
- [PATCH v3 3/5] migration: Introduce migration_ioc_[un]register_yank(), Peter Xu, 2021/07/22
- [PATCH v3 4/5] migration: Teach QEMUFile to be QIOChannel-aware, Peter Xu, 2021/07/22
- [PATCH v3 5/5] migration: Move the yank unregister of channel_close out, Peter Xu, 2021/07/22
- Re: [PATCH v3 0/5] migrations: Fix potential rare race of migration-test after yank, Lukas Straub, 2021/07/24
- Re: [PATCH v3 0/5] migrations: Fix potential rare race of migration-test after yank, Dr. David Alan Gilbert, 2021/07/26