qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format


From: Stefan Weil
Subject: Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
Date: Sat, 15 Jan 2011 18:27:17 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101226 Iceowl/1.0b1 Icedove/3.0.11

Am 15.01.2011 04:28, schrieb Chunqiang Tang:
The community block I/O test suite is qemu-iotests:
http://git.kernel.org/?p=linux/kernel/git/hch/qemu-iotests.git;a=summary
If you have tests that you'd like to contribute, please put them into
that framework so other developers can run them as part of their
regular testing.

Hi Stefan,

What I described is not a qemu-io test case. I also use qemu-io, which is
very helpful, but I observed that qemu-io has several limitations in
discovering elusive bugs:

B1) qemu-io cannot trigger many race condition bugs, because it does not
fully control the timing of events. For example, qemu-io cannot test this
scenario: three concurrent writes a, b, and c are processed by
bdrv_aio_writev() in the order of Pa, Pb, and Pc; their writes are
actually persisted on disk in another order of Wc, Wa, and Wb; and finally
their callbacks are invoked in yet another order of Vb, Vc, and Va. Some
race condition bugs may exist in the code (e.g., inappropriate locking),
because it does not anticipate these orders of events are possible. This
is just one example. In theory, there can be 100 concurrent reads or
writes, and their events can happen in an arbitrary permutation order. It
is nearly impossible to manually generating test cases for all of them.

B2) Even if a race condition bug is triggered by chance, its behavior
depends on subtle event timing that is hard to repeat and hence hard to
debug.

B3) With qemu-io, it is hard to test code paths that handle I/O failures.
For example, a disk write may fail due to disk media error. Because these
errors are rare, the failure handling code paths may never be tested,
which for example may contain a null pointer bug that can crash the entire
VM or gradually leaks resources (e.g., memory) due to incomplete cleanup.

B4) qemu-io requires manually creating test cases, which is not only time
consuming but also leads a low coverage in testing. This is because many
bugs happen in scenarios that the developers do not anticipate, and hence
do not know how to create test cases in the first place.

The FVD patch includes a new testing framework that addresses the above
issues. This testing framework is orthogonal to FVD and can be used to
test other block device drivers as well. This testing framework includes
two components that can be used both separately and in a combination

T1) To address the problems of B1- B3, I implemented an emulated disk in
block/sim.c, which allows a full control of event timings, either manually or automatically. Given the three concurrent writes example above, their 9
events (Pa, Pb, Pc, Wa, Wb, Wc, Va, Vb, and Vc) can be precisely
controlled to be executed in any given order. Moreover, the emulated disk
can inject disk I/O errors in a controlled manner. For example, it can
fail a specific read or write to test how the code handles that, or it can
even fail as many as 90% of the reads/writes to test if the code has
resource leaks. qemu-io is extended with a module qemu-io-sim.c to work
with the emulated disk block/sim.c, so that the tester can use the qemu-io
console to manually control the order of events or fail disk reads or
writes.

T2) The solution in T1 still does not address the problem of B3), i.e.,
manually generating test cases is time consuming and has a low coverage.
This problem is solved by a new testing tool called qemu-test. qemu-test
can 1) automatically generate an unlimited number of randomized test cases
that, e.g., execute 1,000 concurrent disk reads or writes on overlapping
disk regions; 2) automatically generate the corresponding anticipated
correct results, automatically run the tests, and automatically compare
the actual test results with the anticipated correct results. Once it
discovers a difference, which indicates a bug, it halts testing and waits
for the developer to debug. The randomized test cases created by
qemu-test are controlled by a pseudo random number generator, and hence
the behavior is completely repeatable. Therefore, once a bug is triggered,
it can be precisely repeated for an unlimited number of times to
facilitate debugging, even if this bug happens extremely rare in real runs
of a VM. qemu-test is fully automated. Once started, it can continuously
run, e.g., for months to test an enormous number of test cases.

The implementation of qemu-test is actually not that complicated. It opens two virtual disks, the so-called truth image and test image, respectively.
The truth image is served by a trivial synchronous block device driver so
that its behavior is guaranteed to be correct. The test image is served a
real block device driver (e.g., FVD or QCOW2) that we want to test.
qemu-test submits the same sequence of disk I/O requests (which is
randomly generated) to the truth image and the test image, and expect that
the two images’ contents never diverge. Otherwise, it indicates a bug in
the test image’s block device driver. qemu-test works with the emulated
disk block/sim.c so that it can randomize event timings in a controlled
manner and can inject disk I/O errors randomly.

I found qemu-test extremely powerful in discovering elusive bugs that I
never anticipated, and using qemu-test is effortless. Whenever I completed
some major code upgrade, I simply started qemu-test in the evening and
came back in the morning to collect bugs, if any. Debugging them is also
easy because the bugs are precisely repeatable even if they are hard to
trigger.

As for the QCOW2 bug I mentioned previously, it can be triggered by
test-qcow2.sh. A faster way to trigger it is to bypass those correct test
runs by executing the commands below:

dd if=/dev/zero of=/var/ramdisk/truth.raw count=0 bs=1 seek=1155683840
dd if=/dev/zero of=/var/ramdisk/zero-500M.raw count=0 bs=1 seek=609064448
./qemu-img create -f qcow2 -b /var/ramdisk/zero-500M.raw
/var/ramdisk/test.qcow2 1155683840
./qemu-test --seed=116579177 --truth=/var/ramdisk/truth.raw
--test=/var/ramdisk/test.qcow2 --verify_write=true --compare_before=false
--compare_after=true --round=100000 --parallel=100 --io_size=10485760
--fail_prob=0 --cancel_prob=0 --instant_qemubh=true

As for the FVD patch that includes the new testing framework, I tried to
post it on the mailing list twice but it always got bounced back, either
because the message is too big or because of a Notes client configuration
issue. Until I figure it out, please down the FVD patch from
https://researcher.ibm.com/researcher/files/us-ctang/FVD-01-14-2011.patch
.

Best regards,
ChunQiang (CQ) Tang, Ph.D.
Homepage: http://www.research.ibm.com/people/c/ctang

Hi,

when I tried to use your patch, I found several problems:

* The patch does apply cleanly to latest QEMU.
  This is caused by recent changes in QEMU git master.

* The new code uses tabs instead of spaces (QEMU coding rules).

* Some lines of the new code end with blank characters.

* The patch adds empty lines at the end of some files.

The last two points are reported by newer versions of git
(which refuse to take such patches with the default setting).

Could you please update your patch to fix those topics?
I'd like to apply it to my QEMU code and try the new FVD.

If needed, I could also send your patch to qemu-devel.

Kind regards,
Stefan Weil




reply via email to

[Prev in Thread] Current Thread [Next in Thread]