qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes r


From: Daniel P . Berrangé
Subject: Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected
Date: Tue, 23 Apr 2019 15:13:32 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

On Mon, Apr 22, 2019 at 09:51:17AM -0500, Eric Blake wrote:
> On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> > This is a followup to
> > 
> >   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
> >   v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
> > 
> > This series comes out of a discussion between myself & Yongji Xie in:
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
> > 
> > I eventually understood that the problem faced was that
> > tcp_chr_wait_connected was racing with the background connection attempt
> > previously started, causing two connections to be established. This
> > broke because some vhost user servers only allow a single connection.
> > 
> > After messing around with the code alot the final solution was in fact
> > very easy. We simply have to delay the first background connection
> > attempt until the main loop is running. It will then automatically
> > turn into a no-op if tcp_chr_wait_connected has been run. This is
> > dealt with in the last patch in this series
> > 
> > I believe this should solve the problem Yongji Xie faced, and thus not
> > require us to add support for "nowait" option with client sockets at
> > all. The reconnect=1 option effectively already implements nowait
> > semantics, and now plays nicely with tcp_chr_wait_connected.
> > 
> > In investigating this I found various other bugs that needed fixing and
> > identified some useful refactoring to simplify / clarify the code, hence
> > this very long series.
> 
> Even with this series applied, I'm still seeing sporadic failures of
> iotest 169. Max posted a hack patch a while back that tries to work
> around the race:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html
> 
> which he originally diagnosed in iotest 147:
> https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html
> 
> but as it was a hack, he has not pursued it further, and so the symptoms
> are still there, although not completely reproducible:
> 
> 169 10s ... - output mismatch (see 169.out.bad)
> --- /home/eblake/qemu/tests/qemu-iotests/169.out      2018-11-16
> 15:48:12.018526748 -0600
> +++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad  2019-04-22
> 09:38:45.481517132 -0500
> @@ -1,3 +1,5 @@
> +WARNING:qemu:qemu received signal 11:
> /home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
> -chardev
> socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
> -mon chardev=mon,mode=control -display none -vga none -qtest
> unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
> -machine accel=qtest -nodefaults -machine accel=qtest -drive
> if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback
> 
> Any chance you can take a look as to what a non-hack fix should be?

Oh, it looks like we dropped the ball here. We have a fix already but
it doesn't appear to have been merged for 4.0 :-(

  https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg06174.html

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]