qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Race condition in overlayed qcow2?


From: dovgaluk
Subject: Race condition in overlayed qcow2?
Date: Wed, 19 Feb 2020 17:32:40 +0300
User-agent: Roundcube Webmail/1.4.1

Hi!

I encountered a problem with record/replay of QEMU execution and figured out the following, when QEMU is started with one virtual disk connected to the qcow2 image with applied 'snapshot' option.

The patch d710cf575ad5fb3ab329204620de45bfe50caa53 "block/qcow2: introduce parallel subrequest handling in read and write" introduces some kind of race condition, which causes difference in the data read from the disk.

I detected this by adding the following code, which logs IO operation checksum. And this checksum may be different in different runs of the same recorded execution.

logging in blk_aio_complete function:
qemu_log("%"PRId64": blk_aio_complete\n", replay_get_current_icount());
        QEMUIOVector *qiov = acb->rwco.iobuf;
        if (qiov && qiov->iov) {
            size_t i, j;
            uint64_t sum = 0;
            int count = 0;
            for (i = 0 ; i < qiov->niov ; ++i) {
                for (j = 0 ; j < qiov->iov[i].iov_len ; ++j) {
                    sum += ((uint8_t*)qiov->iov[i].iov_base)[j];
                    ++count;
                }
            }
qemu_log("--- iobuf offset %"PRIx64" len %x sum: %"PRIx64"\n", acb->rwco.offset, count, sum);
        }

I tried to get rid of aio task by patching qcow2_co_preadv_part:
ret = qcow2_co_preadv_task(bs, ret, cluster_offset, offset, cur_bytes, qiov, qiov_offset);

That change fixed a bug, but I have no idea what to debug next to figure out the exact reason of the failure.

Do you have any ideas or hints?

Pavel Dovgalyuk



reply via email to

[Prev in Thread] Current Thread [Next in Thread]