qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] New iotest repros failures on virtio external snapshot


From: Stefan Hajnoczi
Subject: Re: [Qemu-block] New iotest repros failures on virtio external snapshot with iothread
Date: Mon, 3 Apr 2017 17:57:55 +0100
User-agent: Mutt/1.8.0 (2017-02-23)

On Wed, Mar 29, 2017 at 07:01:38PM -0700, Ed Swierk wrote:
> Parts of qemu's block code have changed a lot in recent months but are
> not well exercised by current tests.
> 
> Subtle bugs have crept in causing assertion failures, hangs and other
> crashes in a variety of situations: immediately on start, on first
> guest activity, on external snapshot create or commit, on qmp quit
> command.
> 
> Reproducing these bugs has proved tricky, as each may occur only with
> a specific combination of qemu version, block device type (virtio-blk
> or virtio-scsi) and iothread enabled or not. In some cases the bug
> occurs only after several external snapshot operations. And in some
> cases the bug only manifests when a guest is accessing the block
> device simultaneously.
> 
> I've written an iotest (number 176, for now) that attempts to cover
> many of these configurations. Currently it only exercises the external
> snapshot create and commit lifted from iotest 118. The new iotest does
> this repeatedly in each of 16 combinations:
> - no guest / guest
> - virtio-blk / virtio-scsi
> - no iothread / iothread
> - single / repeated external snapshot create+commit

Thanks Ed!  This is has a lot of potential.  I see three different
issues that can be discussed separately:

1. Urgent 2.9 bug fix for `ctx->external_disable_cnt > 0' failed
assertion.  I believe you've already started a separate email thread
about it called "Assertion failure taking external snapshot with virtio
drive + iothread".

2. QEMU 2.8 stable hang.  Less urgent but worth understanding, perhaps
via git-bisect against QEMU 2.9.

3. Minor iotest enhancements.  Please send a separate patch series.

4. How to automate tests with real Linux guests?  This is a complex
topic and probably what we should discuss in this email thread.

The buildroot + busybox approach is good for a small set of sanity
tests.  There was a similar attempt here:
https://github.com/stsquad/qemu-jeos

Building from source becomes a challenge when other people want to add
software to test other areas of QEMU.  The process also requires
attention to maintain the image over time (e.g. as host build
environments change).

There are image builder tools like virt-builder and mkosi for building
bootable virtual machine images based on standard Linux distros:
http://libguestfs.org/virt-builder.1.html
https://github.com/systemd/mkosi

This eliminates the build-from-source hassles and gives us a full Linux
guest environment.  Booting is very fast with mkosi so the advantage to
custom building a minimal image is negligible.

My suggestion is:

Let's pick an image builder tool like virt-builder and keep a single
build script per guest architecture (e.g.  build-test-os-x86_64.sh).
All tests for that architecture run against the same disk image.

It's easy to add additional software to the disk image by modifying the
build script.

A Makefile ensures that the image file gets rebuilt if the build script
has changed.

> 
> I made some minor changes to the test infrastructure so the new iotest
> can deal gracefully with qemu hanging--the test script itself
> shouldn't hang. And in all failure modes the test needs to expose
> enough console output and other information to diagnose the problem.
> 
> The main departure from existing iotests is running a real guest. I
> used buildroot to generate a small (~4 MB) Linux kernel with built-in
> initrd containing a busybox-based userland. After the iotest launches
> qemu, the guest loops writing to the block device, while the test
> performs snapshot operations.
> 
> I ran the new iotest on 3 qemu versions: 2.7.1, stable-2.8-staging and
> 2.9.0-rc2. The latter two fail several test cases, all
> iothread-enabled. Only 2.7.1 passes all the cases.
> 
> Here is the code for the new iotest (I didn't dare email patches with
> a 4 MB blob):
> https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.7
> https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.8
> https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9
> 
> And here is the buildroot I used to generate the guest Linux kernel+initrd:
> https://github.com/skyportsystems/buildroot-1/commits/qemu-iotests
> 
> Please check out the code and try the new test--particularly anyone
> who can also help figure out these failures. (Note that since half the
> test cases use an iothread, /dev/kvm must be readable and writable.)
> 
> * stable-2.8-staging
> - guest, virtio-blk, iothread, single snapshot create+commit: hang on
> quit (intermittent)
> - guest, virtio-blk, iothread, repeated snapshot create+commit: hang
> after 1 iteration
> - guest, virtio-scsi, iothread, single snapshot create+commit: hang on
> quit (intermittent)
> - guest, virtio-scsi, iothread, repeated snapshot create+commit: hang
> after 1 iteration
> 
> * 2.9.0-rc2
> - guest, virtio-blk, iothread, single snapshot create+commit:
> "include/block/aio.h:457: aio_enable_external: Assertion
> `ctx->external_disable_cnt > 0' failed." after snapshot create
> - guest, virtio-blk, iothread, repeated snapshot create+commit: same as above
> - guest, virtio-scsi, iothread, single snapshot create+commit: same as above
> - guest, virtio-scsi, iothread, repeated snapshot create+commit: same as above
> - no guest, virtio-blk, iothread, repeated snapshot create+commit: same as 
> above
> - no guest, virtio-scsi, iothread, single snapshot create+commit: same as 
> above
> - no guest, virtio-scsi, iothread, repeated snapshot create+commit:
> same as above
> 
> --Ed
> 

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]