|
| From: | Vladimir Sementsov-Ogievskiy |
| Subject: | Re: How to tame CI? |
| Date: | Thu, 5 Oct 2023 15:35:15 +0300 |
| User-agent: | Mozilla Thunderbird |
On 26.07.23 16:32, Thomas Huth wrote:
On 26/07/2023 15.00, Peter Maydell wrote:On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:To make things easier, this is the part that show how it breaks (this is the gcov test): 357/423 qemu:block / io-qcow2-copy-before-write ERROR 6.38s exit status 1PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44 /builds/juan.quintela/qemu/build/pyvenv/bin/python3 /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――― stderr: --- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out +++ /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad @@ -1,5 +1,21 @@ -.... +...F +====================================================================== +FAIL: test_timeout_break_snapshot (__main__.TestCbwError) +---------------------------------------------------------------------- +Traceback (most recent call last): + File "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot + self.assertEqual(log, """\ +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n' + wrote 524288/524288 bytes at offset 0 + 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + wrote 524288/524288 bytes at offset 524288 + 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) ++ read failed: Permission denied +- read 1048576/1048576 bytes at offset 0 +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +This iotest failing is an intermittent that I've seen running pullreqs on master. I tend to see it on the s390 host. I suspect a race condition somewhere where it fails if the host is heavily loaded.It's obviously a failure in an iotest, so let's CC: the corresponding people (done now).
Sorry for long delay. Does it still fail? In the test we expect that copy-before-write operation fails (because of throttling and timeout), and therefore snapshot is broken and next read from snapshot should fail. But most probably the copy-before-write operation succeeded in this case for some reason.. I don't think that throttling and timeouts in block layer can guarantee some determinism.. But usually it works. we use throttling with bps-write = 300 * 1024, i.e. 300KB per second. and cbw-timeout is set to 1 second. Then we do write 512K, then the comment say: # We need second write to trigger throttling and we write another 512K. first 512K are written, and we should wait 512/300 = 1.7 seconds since _start_ of that write before issuing the second one.. But if write was slow we may have to wait less than a second from finish of the first write start the second one. Then timeout will not fire. ==== I see two possible ways to fix that: 1. decrease bps-write a bit. For example to 200 BPS. 2. rework the test to use null-co instead of real images. This way we will not suffer from unstable IO duration. So, is the problem still fire sometimes? -- Best regards, Vladimir
| [Prev in Thread] | Current Thread | [Next in Thread] |