qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes r


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected
Date: Mon, 22 Apr 2019 09:51:17 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> This is a followup to
> 
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
>   v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
> 
> This series comes out of a discussion between myself & Yongji Xie in:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
> 
> I eventually understood that the problem faced was that
> tcp_chr_wait_connected was racing with the background connection attempt
> previously started, causing two connections to be established. This
> broke because some vhost user servers only allow a single connection.
> 
> After messing around with the code alot the final solution was in fact
> very easy. We simply have to delay the first background connection
> attempt until the main loop is running. It will then automatically
> turn into a no-op if tcp_chr_wait_connected has been run. This is
> dealt with in the last patch in this series
> 
> I believe this should solve the problem Yongji Xie faced, and thus not
> require us to add support for "nowait" option with client sockets at
> all. The reconnect=1 option effectively already implements nowait
> semantics, and now plays nicely with tcp_chr_wait_connected.
> 
> In investigating this I found various other bugs that needed fixing and
> identified some useful refactoring to simplify / clarify the code, hence
> this very long series.

Even with this series applied, I'm still seeing sporadic failures of
iotest 169. Max posted a hack patch a while back that tries to work
around the race:

https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html

which he originally diagnosed in iotest 147:
https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html

but as it was a hack, he has not pursued it further, and so the symptoms
are still there, although not completely reproducible:

169 10s ... - output mismatch (see 169.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/169.out        2018-11-16
15:48:12.018526748 -0600
+++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad    2019-04-22
09:38:45.481517132 -0500
@@ -1,3 +1,5 @@
+WARNING:qemu:qemu received signal 11:
/home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
-chardev
socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
-mon chardev=mon,mode=control -display none -vga none -qtest
unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
-machine accel=qtest -nodefaults -machine accel=qtest -drive
if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback

Any chance you can take a look as to what a non-hack fix should be?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]