[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 回复: 回复: [PATCH] migration/multifd: receive channel socket needs to b
From: |
Peter Xu |
Subject: |
Re: 回复: 回复: [PATCH] migration/multifd: receive channel socket needs to be set to non-blocking |
Date: |
Wed, 25 Sep 2024 11:32:21 -0400 |
On Tue, Sep 24, 2024 at 08:25:22AM +0000, Yuchen wrote:
>
>
> > -----邮件原件-----
> > 发件人: Daniel P. Berrangé <berrange@redhat.com>
> > 发送时间: 2024年9月24日 0:59
> > 收件人: yuchen (CCSPL) <yu.chen@h3c.com>
> > 抄送: Peter Xu <peterx@redhat.com>; farosas@suse.de;
> > qemu-devel@nongnu.org
> > 主题: Re: 回复: [PATCH] migration/multifd: receive channel socket needs to
> > be set to non-blocking
> >
> > On Mon, Sep 23, 2024 at 01:33:13AM +0000, Yuchen wrote:
> > >
> > >
> > > > -----邮件原件-----
> > > > 发件人: Peter Xu <peterx@redhat.com>
> > > > 发送时间: 2024年9月20日 23:53
> > > > 收件人: yuchen (CCSPL) <yu.chen@h3c.com>
> > > > 抄送: farosas@suse.de; qemu-devel@nongnu.org
> > > > 主题: Re: [PATCH] migration/multifd: receive channel socket needs to
> > > > be set to non-blocking
> > > >
> > > > On Fri, Sep 20, 2024 at 10:05:42AM +0000, Yuchen wrote:
> > > > > When the migration network is disconnected, the source qemu can
> > > > > exit normally with an error, but the destination qemu is always
> > > > > blocked in recvmsg(), causes the destination qemu main thread to be
> > blocked.
> > > > >
> > > > > The destination qemu block stack:
> > > > > Thread 13 (Thread 0x7f0178bfa640 (LWP 1895906) "multifdrecv_6"):
> > > > > #0 0x00007f041b5af56f in recvmsg ()
> > > > > #1 0x000055573ebd0b42 in qio_channel_socket_readv
> > > > > #2 0x000055573ebce83f in qio_channel_readv
> > > > > #3 qio_channel_readv_all_eof
> > > > > #4 0x000055573ebce909 in qio_channel_readv_all
> > > > > #5 0x000055573eaa1b1f in multifd_recv_thread
> > > > > #6 0x000055573ec2f0b9 in qemu_thread_start
> > > > > #7 0x00007f041b52bf7a in start_thread
> > > > > #8 0x00007f041b5ae600 in clone3
> > > > >
> > > > > Thread 1 (Thread 0x7f0410c62240 (LWP 1895156) "kvm"):
> > > > > #0 0x00007f041b528ae2 in __futex_abstimed_wait_common ()
> > > > > #1 0x00007f041b5338b8 in __new_sem_wait_slow64.constprop.0
> > > > > #2 0x000055573ec2fd34 in qemu_sem_wait (sem=0x555742b5a4e0)
> > > > > #3 0x000055573eaa2f09 in multifd_recv_sync_main ()
> > > > > #4 0x000055573e7d590d in ram_load_precopy
> > > > (f=f@entry=0x555742291c20)
> > > > > #5 0x000055573e7d5cbf in ram_load (opaque=<optimized out>,
> > > > > version_id=<optimized out>, f=0x555742291c20)
> > > > > #6 ram_load_entry (f=0x555742291c20, opaque=<optimized out>,
> > > > > version_id=<optimized out>)
> > > > > #7 0x000055573ea932e7 in qemu_loadvm_section_part_end
> > > > > (mis=0x555741136c00, f=0x555742291c20)
> > > > > #8 qemu_loadvm_state_main (f=f@entry=0x555742291c20,
> > > > > mis=mis@entry=0x555741136c00)
> > > > > #9 0x000055573ea94418 in qemu_loadvm_state (f=0x555742291c20,
> > > > > mode=mode@entry=VMS_MIGRATE)
> > > > > #10 0x000055573ea88be1 in process_incoming_migration_co
> > > > > (opaque=<optimized out>)
> > > > > #11 0x000055573ec43d13 in coroutine_trampoline (i0=<optimized
> > > > > out>, i1=<optimized out>)
> > > > > #12 0x00007f041b4f5d90 in ?? () from target:/usr/lib64/libc.so.6
> > > > > #13 0x00007ffc11890270 in ?? ()
> > > > > #14 0x0000000000000000 in ?? ()
> > > > >
> > > > > Setting the receive channel to non-blocking can solve the problem.
> > > >
> > > > Multifd threads are real threads and there's no coroutine, I'm
> > > > slightly confused why it needs to use nonblock.
> > > >
> > > > Why recvmsg() didn't get kicked out when disconnect? Is it a
> > > > generic Linux kernel are you using?
> > > >
> > > My steps to reproduce:
> > > ifdown migration network, or disable migration network using iptables.
> > > The probability of recurrence of these two methods is very high.
> > >
> > > My test environment uses is linux-5.10.136.
> > >
> > > multifd thread block in kernel:
> > > # cat /proc/3416190/stack
> > > [<0>] wait_woken+0x43/0x80
> > > [<0>] sk_wait_data+0x123/0x140
> > > [<0>] tcp_recvmsg+0x4f8/0xa50
> > > [<0>] inet6_recvmsg+0x5e/0x120
> > > [<0>] ____sys_recvmsg+0x87/0x180
> > > [<0>] ___sys_recvmsg+0x82/0x110
> > > [<0>] __sys_recvmsg+0x56/0xa0
> > > [<0>] do_syscall_64+0x3d/0x80
> > > [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
> > >
> > > > I wonder whether that's the expected behavior for sockets. E.g., we
> > > > do have multifd/cancel test (test_multifd_tcp_cancel) and I think
> > > > that runs this path too with it always in block mode as of now..
> > > >
> > > My previous statement may not be accurate. The migration network socket is
> > not disconnected.
> > > I use ifdown or iptables to simulate the network card failure.
> > > Because the TCP connection was not disconnected, so recvmsg() was
> > blocked.
> >
> > How long did you wait after doing ifdown ? TCP is intended to wait if
>
> I waited about 15 minutes, the source qemu migration threads quit, but
> the destination qemu migration threads is still there.
>
> > there is an interruption.... only eventually after relevant TCP timeouts
> > are hit, it
> > will terminate the connection. QEMU shouldn't proactively give up if the
> > TCP
> > conn is still in an active state as reported by the kernel, even if traffic
> > isn't
> > currently flowing.
> >
> >
>
> Daniel, I agree with what you said, But in fact, the destination migration
> connection is not disconnected
> and is in the close wait state.
>
> The source qemu process lsof and top:
> # lsof -p 384509
> ...
> kvm 384509 root 112u sock 0,8 0t0 157321811
> protocol: TCP
> kvm 384509 root 113u sock 0,8 0t0 157321813
> protocol: TCP
> kvm 384509 root 114u sock 0,8 0t0 157321815
> protocol: TCP
> kvm 384509 root 115u sock 0,8 0t0 157321817
> protocol: TCP
> kvm 384509 root 116u sock 0,8 0t0 157321819
> protocol: TCP
> kvm 384509 root 117u sock 0,8 0t0 157321821
> protocol: TCP
> kvm 384509 root 118u sock 0,8 0t0 157321823
> protocol: TCP
> kvm 384509 root 119u sock 0,8 0t0 157321825
> protocol: TCP
>
> # top -H -p 384509
> top - 15:10:22 up 5 days, 18:54, 3 users, load average: 5.16, 4.61, 4.50
> Threads: 8 total, 3 running, 5 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 5.2 us, 5.2 sy, 0.0 ni, 89.3 id, 0.0 wa, 0.1 hi, 0.1 si, 0.0
> st
> MiB Mem : 128298.7 total, 41490.2 free, 89470.2 used, 2168.0 buff/cache
> MiB Swap: 42922.0 total, 42910.4 free, 11.6 used. 38828.5 avail Mem
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 384596 root 20 0 11.9g 93516 40112 R 98.7 0.1 261:13.24 CPU
> 1/KVM
> 384595 root 20 0 11.9g 93516 40112 R 98.0 0.1 56:44.31 CPU
> 0/KVM
> 384509 root 20 0 11.9g 93516 40112 R 1.3 0.1 7:38.73 kvm
> 384563 root 20 0 11.9g 93516 40112 S 0.0 0.1 0:00.05 kvm
> 384598 root 20 0 11.9g 93516 40112 S 0.0 0.1 0:01.00
> vnc_worker
> 1544593 root 20 0 11.9g 93516 40112 S 0.0 0.1 0:00.00 worker
>
> The destination qemu process lsof and top:
> # lsof -p 3236693
> kvm 3236693 root 29u IPv6 159227758 0t0 TCP
> node18:49156->2.2.2.6:41880 (CLOSE_WAIT)
> kvm 3236693 root 30u IPv6 159227759 0t0 TCP
> node18:49156->2.2.2.6:41890 (ESTABLISHED)
> kvm 3236693 root 31u IPv6 159227760 0t0 TCP
> node18:49156->2.2.2.6:41902 (ESTABLISHED)
> kvm 3236693 root 32u IPv6 159227762 0t0 TCP
> node18:49156->2.2.2.6:41912 (ESTABLISHED)
> kvm 3236693 root 33u IPv6 159227761 0t0 TCP
> node18:49156->2.2.2.6:41904 (ESTABLISHED)
> kvm 3236693 root 34u IPv6 159227763 0t0 TCP
> node18:49156->2.2.2.6:41918 (ESTABLISHED)
> kvm 3236693 root 35u IPv6 159227764 0t0 TCP
> node18:49156->2.2.2.6:41924 (ESTABLISHED)
> kvm 3236693 root 36u IPv6 159227765 0t0 TCP
> node18:49156->2.2.2.6:41934 (ESTABLISHED)
> kvm 3236693 root 37u IPv6 159227766 0t0 TCP
> node18:49156->2.2.2.6:41942 (ESTABLISHED)
>
> # top -H -p 3236693
> top - 15:09:25 up 5 days, 19:12, 2 users, load average: 0.63, 0.68, 0.89
> Threads: 15 total, 0 running, 15 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 1.3 us, 0.5 sy, 0.0 ni, 98.1 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0
> st
> MiB Mem : 128452.1 total, 43515.4 free, 87291.7 used, 2527.4 buff/cache
> MiB Swap: 42973.0 total, 42968.4 free, 4.6 used. 41160.4 avail Mem
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3236693 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.41 kvm
> 3236714 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.00 kvm
> 3236745 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.00 CPU
> 0/KVM
> 3236746 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.00 CPU
> 1/KVM
> 3236748 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.00
> vnc_worker
> 3236750 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.45
> multifdrecv_4
> 3236751 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.42
> multifdrecv_5
> 3236752 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.35
> multifdrecv_6
> 3236753 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.49
> multifdrecv_7
> 3236754 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.45
> multifdrecv_1
> 3236755 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.43
> multifdrecv_2
> 3236756 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.45
> multifdrecv_3
> 3236757 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.44
> multifdrecv_0
>
> So we should still set the multifd channel socket to non-blocking ?
Have you looked at why the timeout didn't work?
After all, QEMU is not the only application that uses recvmsg() like this,
so I wonder whether it's intended or it's a kernel bug that recvmsg()
didn't get kicked out.
>
> > With regards,
> > Daniel
> > --
> > |: https://berrange.com -o-
> > https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org -o-
> > https://fstop138.berrange.com :|
> > |: https://entangle-photo.org -o-
> > https://www.instagram.com/dberrange :|
>
> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from New
> H3C, which is
> intended only for the person or entity whose address is listed above. Any use
> of the
> information contained herein in any way (including, but not limited to, total
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender
> by phone or email immediately and delete it!
--
Peter Xu