qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vhost-user-test failure


From: Maxime Coquelin
Subject: Re: [Qemu-devel] vhost-user-test failure
Date: Mon, 26 Sep 2016 16:07:54 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0



On 09/26/2016 02:52 PM, Maxime Coquelin wrote:
Hi,

On 09/26/2016 02:13 PM, Eduardo Habkost wrote:
On Sun, Sep 25, 2016 at 04:55:53PM -0400, Marc-André Lureau wrote:
Hi

----- Original Message -----
This time with Marc-André in cc:...

On 09/23/2016 07:40 PM, Maxime Coquelin wrote:


On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote:
On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote:
Hi,

I hit a weird vhost-user-test failure on travis-ci recently, on a
branch where I didn't touch any vhost-related code. From a quick
look at the code, it looks like the vhost-user code is unhappy to
see a disconnected socket.

I wasn't able to reproduce it. It seems to be a hard to reproduce
race between vhost-user code and socket reconnection.

The failure can be seen at:

https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239

Maxime looked at something similiar. Any idea?
No, not really.
Marc-André contributed a lot to these tests, I add him in cc: in case
he has an idea.

I will have a look in the mean time.


I am unable to reproduce locally (over 500x iterations), and I
have no clue what's going on: the warnings there aren't the
problem (that's the main reason why we use the subprocess, to
silence those). Do you have a local reproducer or is it only on
travis? Afaik, there are no other reports of this test failing,
are you sure its not related to changes on your branch?

I don't have a local reproducer, I could only see it once on
travis-ci. Maybe it is not possible to reproduce it if the
machine isn't loaded enough to make the right thread/process be
delayed.

I'm also trying to reproduce it.
Interestingly, launching the test with strace, I reproduce another
problem systematically:
$> strace -o /tmp/vut -ff ./tests/vhost-user-test
/x86_64/vhost-user/read-guest-mem: OK
/x86_64/vhost-user/migrate: Vhost user backend fails to broadcast fake RARP
OK
/x86_64/vhost-user/reconnect: OK

I'll try to load the CPU randomly when executing the test.

FYI, I reproduced it once over ~200 runs while stressing the CPUs:
/x86_64/vhost-user/read-guest-mem: OK
/x86_64/vhost-user/migrate: OK
/x86_64/vhost-user/reconnect: **
ERROR:/home/max/projects/src/mainline/qemu/tests/vhost-user-test.c:715:test_reconnect: child process (/x86_64/vhost-user/reconnect/subprocess [8797]) failed unexpectedly
qemu-system-x86_64: Failed to set msg fds.
qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
qemu-system-x86_64: Failed to set msg fds.
qemu-system-x86_64: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)

I'll continue the investigation.

Maxime



reply via email to

[Prev in Thread] Current Thread [Next in Thread]