[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes r
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected |
Date: |
Mon, 22 Apr 2019 09:51:17 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 |
On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> This is a followup to
>
> v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
> v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
>
> This series comes out of a discussion between myself & Yongji Xie in:
>
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
>
> I eventually understood that the problem faced was that
> tcp_chr_wait_connected was racing with the background connection attempt
> previously started, causing two connections to be established. This
> broke because some vhost user servers only allow a single connection.
>
> After messing around with the code alot the final solution was in fact
> very easy. We simply have to delay the first background connection
> attempt until the main loop is running. It will then automatically
> turn into a no-op if tcp_chr_wait_connected has been run. This is
> dealt with in the last patch in this series
>
> I believe this should solve the problem Yongji Xie faced, and thus not
> require us to add support for "nowait" option with client sockets at
> all. The reconnect=1 option effectively already implements nowait
> semantics, and now plays nicely with tcp_chr_wait_connected.
>
> In investigating this I found various other bugs that needed fixing and
> identified some useful refactoring to simplify / clarify the code, hence
> this very long series.
Even with this series applied, I'm still seeing sporadic failures of
iotest 169. Max posted a hack patch a while back that tries to work
around the race:
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html
which he originally diagnosed in iotest 147:
https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html
but as it was a hack, he has not pursued it further, and so the symptoms
are still there, although not completely reproducible:
169 10s ... - output mismatch (see 169.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/169.out 2018-11-16
15:48:12.018526748 -0600
+++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad 2019-04-22
09:38:45.481517132 -0500
@@ -1,3 +1,5 @@
+WARNING:qemu:qemu received signal 11:
/home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
-chardev
socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
-mon chardev=mon,mode=control -display none -vga none -qtest
unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
-machine accel=qtest -nodefaults -machine accel=qtest -drive
if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback
Any chance you can take a look as to what a non-hack fix should be?
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature
- Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected,
Eric Blake <=