[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/2] Postcopy migration and vhost-user errors
From: |
Peter Xu |
Subject: |
Re: [PATCH 0/2] Postcopy migration and vhost-user errors |
Date: |
Thu, 11 Jul 2024 11:38:38 -0400 |
On Thu, Jul 11, 2024 at 06:44:22PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
>
> Hello,
>
> * virsh(1) offers multiple options to initiate Postcopy migration:
>
> 1) virsh migrate --postcopy --postcopy-after-precopy
> 2) virsh migrate --postcopy + virsh migrate-postcopy
> 3) virsh migrate --postcopy --timeout <N> --timeout-postcopy
>
> When Postcopy migration is invoked via method (2) or (3) above,
> the guest on the destination host seems to hang or get stuck sometimes.
>
> * During Postcopy migration, multiple threads are spawned on the destination
> host to start the guest and setup devices. One such thread starts vhost
> device via vhost_dev_start() function and another called fault_thread handles
Hmm, I thought it was one of the vcpu threads that invoked
vhost_dev_start(), rather than any migration thread?
> page faults in user space using kernel's userfaultfd(2) system.
>
> When fault_thread exits upon completion of Postcopy migration, it sends a
> 'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
> message is sent while vhost device is being setup via vhost_dev_start().
>
> Thread-1 Thread-2
>
> vhost_dev_start postcopy_ram_incoming_cleanup
> vhost_device_iotlb_miss postcopy_notify
> vhost_backend_update_device_iotlb vhost_user_postcopy_notifier
> vhost_user_send_device_iotlb_msg vhost_user_postcopy_end
> process_message_reply process_message_reply
> vhost_user_read vhost_user_read
> vhost_user_read_header vhost_user_read_header
> "Fail to update device iotlb" "Failed to receive reply to
> postcopy_end"
>
> This creates confusion when vhost device receives 'postcopy_end' message while
> it is still trying to update IOTLB entries.
>
> This seems to leave the guest in a stranded/hung state because fault_thread
> has exited saying Postcopy migration has ended, but vhost-device is probably
> still expecting updates. QEMU logs following errors on the destination host
> ===
> ...
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header.
> Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to
> postcopy_end
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header.
> Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header.
> Flags 0x8 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header.
> Flags 0x16 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header.
> Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> ===
>
> * Couple of patches here help to fix/handle these errors.
I remember after you added the rwlock, there's still a hang issue.
Did you investigated that? Or do you mean this series will fix all the
problems?
Thanks,
>
> Thank you.
> ---
> Prasad Pandit (2):
> vhost-user: add a write-read lock
> vhost: fail device start if iotlb update fails
>
> hw/virtio/vhost-user.c | 423 +++++++++++++++++++--------------
> hw/virtio/vhost.c | 6 +-
> include/hw/virtio/vhost-user.h | 3 +
> 3 files changed, 259 insertions(+), 173 deletions(-)
>
> --
> 2.45.2
>
--
Peter Xu
[PATCH 2/2] vhost: fail device start if iotlb update fails, Prasad Pandit, 2024/07/11
Re: [PATCH 0/2] Postcopy migration and vhost-user errors,
Peter Xu <=
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Prasad Pandit, 2024/07/15
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Peter Xu, 2024/07/15
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Prasad Pandit, 2024/07/16
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Peter Xu, 2024/07/16
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Michael S. Tsirkin, 2024/07/17
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Peter Xu, 2024/07/17
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Michael S. Tsirkin, 2024/07/17
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Peter Xu, 2024/07/17
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Michael S. Tsirkin, 2024/07/20
- Re: [PATCH 0/2] Postcopy migration and vhost-user errors, Prasad Pandit, 2024/07/23