qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Race condition in overlayed qcow2?


From: dovgaluk
Subject: Re: Race condition in overlayed qcow2?
Date: Thu, 20 Feb 2020 11:31:51 +0300
User-agent: Roundcube Webmail/1.4.1

Vladimir Sementsov-Ogievskiy писал 2020-02-19 19:07:
19.02.2020 17:32, dovgaluk wrote:
I encountered a problem with record/replay of QEMU execution and figured out the following, when QEMU is started with one virtual disk connected to the qcow2 image with applied 'snapshot' option.

The patch d710cf575ad5fb3ab329204620de45bfe50caa53 "block/qcow2: introduce parallel subrequest handling in read and write" introduces some kind of race condition, which causes difference in the data read from the disk.

I detected this by adding the following code, which logs IO operation checksum. And this checksum may be different in different runs of the same recorded execution.

logging in blk_aio_complete function:
        qemu_log("%"PRId64": blk_aio_complete\n", replay_get_current_icount());
         QEMUIOVector *qiov = acb->rwco.iobuf;
         if (qiov && qiov->iov) {
             size_t i, j;
             uint64_t sum = 0;
             int count = 0;
             for (i = 0 ; i < qiov->niov ; ++i) {
                 for (j = 0 ; j < qiov->iov[i].iov_len ; ++j) {
                     sum += ((uint8_t*)qiov->iov[i].iov_base)[j];
                     ++count;
                 }
             }
            qemu_log("--- iobuf offset %"PRIx64" len %x sum: %"PRIx64"\n", acb->rwco.offset, count, sum);
         }

I tried to get rid of aio task by patching qcow2_co_preadv_part:
ret = qcow2_co_preadv_task(bs, ret, cluster_offset, offset, cur_bytes, qiov, qiov_offset);

That change fixed a bug, but I have no idea what to debug next to figure out the exact reason of the failure.

Do you have any ideas or hints?


Hi!

Hmm, do mean that read from the disk may return wrong data? It would
be very bad of course :(
Could you provide a reproducer, so that I can look at it and debug?

It is just a winxp-32 image. I record the execution and replay it with the following command lines:

qemu-system-i386 -icount shift=7,rr=record,rrfile=replay.bin -m 512M -drive file=xp.qcow2,if=none,id=device-34-file,snapshot -drive driver=blkreplay,if=none,image=device-34-file,id=device-34-driver -device ide-hd,drive=device-34-driver,bus=ide.0,id=device-34 -net none

qemu-system-i386 -icount shift=7,rr=replay,rrfile=replay.bin -m 512M -drive file=xp.qcow2,if=none,id=device-34-file,snapshot -drive driver=blkreplay,if=none,image=device-34-file,id=device-34-driver -device ide-hd,drive=device-34-driver,bus=ide.0,id=device-34 -net none

Replay stalls at some moment due to the non-determinism of the execution (probably caused by the wrong data read).

What is exactly the case? May be you have other parallel aio
operations to the same region?

As far as I understand, all aio operations, initiated by IDE controller, are performed one-by-one.
I don't see anything else in the logs.

Ideas to experiment:

1. change QCOW2_MAX_WORKERS to 1 or to 2, will it help?

1 or 2 are ok, and 4 or 8 lead to the failures.

2. understand what is the case in code: is it read from one or several
clusters, is it aligned,
what is the type of clusters, is encryption in use, compression?

There is no encryption and I thinks compression is not enabled too.
Clusters are read from the temporary overlay:

blk_aio_prwv
blk_aio_read_entry
bdrv_co_preadv_part complete offset: 26300000 qiov_offset: 1c200 len: 1e00
bdrv_co_preadv_part complete offset: 24723e00 qiov_offset: 0 len: 1c200
bdrv_co_preadv_part complete offset: c0393e00 qiov_offset: 0 len: 1e000
bdrv_co_preadv_part complete offset: c0393e00 qiov_offset: 0 len: 1e000
bdrv_co_preadv_part complete offset: c0393e00 qiov_offset: 0 len: 1e000


3. understand what kind of data corruption. What we read instead of
correct data? Just garbage, or may be zeroes, or what..

Most bytes are the same, but some are different:

< 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00
< 46 49 4c 45 30 00 03 00 18 d1 33 02 00 00 00 00
< 01 00 01 00 38 00 01 00 68 01 00 00 00 04 00 00
< 00 00 00 00 00 00 00 00 04 00 00 00 9d 0e 00 00
< 02 00 00 00 00 00 00 00 10 00 00 00 60 00 00 00
---
00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00
46 49 4c 45 30 00 03 00 86 78 35 03 00 00 00 00
01 00 01 00 38 00 01 00 60 01 00 00 00 04 00 00
00 00 00 00 00 00 00 00 04 00 00 00 a1 0e 00 00
04 00 00 00 00 00 00 00 10 00 00 00 60 00 00 00

That is strange. I could think, that it was caused by the bugs in
deterministic CPU execution, but the first difference in logs
occur in READ operation (I dump read/write buffers in blk_aio_complete).

Maybe I missed logging in one of the functions?

and of course best thing would be creating small reproducer, or test
in tests/qemu-iotests


Pavel Dovgalyuk



reply via email to

[Prev in Thread] Current Thread [Next in Thread]