qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] drive_del vs. device_del: what should come first?


From: Heinz Graalfs
Subject: Re: [Qemu-devel] drive_del vs. device_del: what should come first?
Date: Fri, 25 Apr 2014 15:13:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 23/04/14 15:01, Stefan Hajnoczi wrote:
On Fri, Apr 11, 2014 at 02:47:20PM +0200, Heinz Graalfs wrote:
Hello Markus,

I finally managed to reproduce the problem, at least once ...

The scenario was:
dd if=/dev/vdx1 of=partitionone

followed by a virsh detach... (with the device_del() under the cover)
during active dd processing

dmesg shows:

[79026.220718] User process fault: interruption code 0x40009 in
qemu-system-s390x[80000000+43c000]
[79026.220737] CPU: 3 PID: 20117 Comm: qemu-system-s39 Not tainted
3.10.33+ #4
[79026.220742] task: 0000000085930000 ti: 00000000e0b74000 task.ti:
00000000e0b74000
[79026.220746] User PSW : 0705000180000000 0000000080269722 (0x80269722)
[79026.220774]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0
CC:0 PM:0 EA:3
User GPRS: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[79026.220827]            0000000000000000 0000000000000000
00000000803761dc 00000000809001d0
[79026.220831]            000003fffd0391a8 00000000808ff560
00000000808eb020 000003ffffc03838
[79026.220834]            0000000000000000 0000000080375e88
00000000802696f6 000003ffffc03838
[79026.220847] User Code: 0000000080269716: b9040034            lgr     %r3,%r4
            000000008026971a: a7290000          lghi    %r2,0
           #000000008026971e: b9870021          dlgr    %r2,%r1
           >0000000080269722: b9040012               lgr     %r1,%r2
            0000000080269726: 5010b0a0          st      %r1,160(%r11)
            000000008026972a: 5820b0a0          l       %r2,160(%r11)
            000000008026972e: e310b0a80004      lg      %r1,168(%r11)
            0000000080269734: 58101000          l       %r1,0(%r1)
[79026.220875] Last Breaking-Event-Address:
[79026.220878]  [<000000008026904c>] 0x8026904c

with
PSW addr: 0000000080269722 pointing into virtio_queue_empty

...
000000008026a67c t virtio_notify_vector
00000000802694a4 T virtio_queue_empty
000000008026b440 T virtio_queue_get_addr
...

and coming from:
R14     00000000802696f6 virtqueue_fill
...
0000000080269f74 T virtqueue_avail_bytes
0000000080269540 T virtqueue_fill
000000008026979c T virtqueue_flush
0000000080269c2c T virtqueue_get_avail_bytes
...

see below the complete core_backtrace ...

On 02/04/14 19:40, Markus Armbruster wrote:
Heinz Graalfs <address@hidden> writes:

On 01/04/14 17:48, Markus Armbruster wrote:
Heinz Graalfs <address@hidden> writes:

Hi Kevin,

doing a

       virsh detach-device ...

ends up in the following QEMU monitor commands:

1. device_del ...
2. drive_del ...

qmp_device_del() performs the device unplug path.
In case of a block device do_drive_del() tries to
prevent further IO against the host device.

However, bdrv_find() during drive_del() results in
an error, because the device is already gone. Due to
this error all the bdrv_xxx calls to quiesce the block
driver as well as all other processing is skipped.

Is the sequence that libvirt triggers OK?
Shouldn't drive_del be executed first?

No.

OK, I see. The drive is deleted implicitly (release_drive()).
Doing a device_del() requires another drive_add() AND device_add().

Yes.  The automatic drive delete on unplug is a wart that will not be
preserved in the new interfaces we're building now.

(Doing just a device_add() complains about the missing drive.
A subsequent info qtree lets QEMU abort.)

Really?  Got a reproducer for me?

drive_del is nasty.  Its purpose is to revoke access to an image even
when the guest refuses to cooperate.  To the guest, this looks like
hardware failure.

Deleting a device during active IO is nasty and it should look like a
hardware failure. I would expect lots of errors.


If you drive_del before device_del, even a perfectly well-behaved guest
guest is exposed to a terminally broken device between drive_del and
completion of unplug.

The early drive_del() would mean that no further IO would be
possible.

A polite way to describe the effect of drive_del on the guest.

drive_del makes all attempts at actual I/O error out without any
forewarning whatsoever.  If you do that to your guest, and something
breaks, you get to keep the pieces.  Even if you "only" do it for a
short period of time.

Always try a device_del first, and only if that does not complete within
reasonable time, and you absolutely must revoke access to the image,
then whack it over the head with drive_del.

What is this reasonable time?

Depends on how heavily loaded host and guest are.  Suggest to measure
with a suitably thrashing guest, then shift left to taste.

On 390 we experience problems (QEMU abort) when asynch block IO
completes and the virtqueues are already gone. I suppose the
bdrv_drain_all() in bdrv_close() is a little late. I don't see such
problems with an early bdrv_drain_all() (drive_del) and an unplug
(device_del) afterwards.

Sounds like a bug.  Got a reproducer?



here is the complete core_backtrace:

7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269722 virtqueue_fill
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269924 virtqueue_push
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a400
virtio_blk_req_complete
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a6a4
virtio_blk_rw_complete
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x34686 bdrv_co_em_bh
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xcd40 aio_bh_poll
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xc7f0 aio_poll
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2ac9c bdrv_drain_all
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2a718 bdrv_close
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2b97e bdrv_delete
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x36db4 bdrv_unref
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa02ea drive_uninit
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa0448 drive_put_ref
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x9fae6 blockdev_auto_del
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd0bf0 release_drive
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c158
object_property_del_all
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c4e4 object_finalize
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18d6d2 object_unref
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c382 object_unparent
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x25ca14
virtio_ccw_busdev_unplug
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd6c88 qdev_unplug
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15c5f0 qmp_device_del
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x182b92
qmp_marshal_input_device_del
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fa36 qmp_call_cmd
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fd2c handle_qmp_command
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2fff3c
json_message_process_token
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32adfe
json_lexer_feed_char
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32b08c json_lexer_feed
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x3000ee
json_message_parser_feed
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fe4c
monitor_control_read
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15cd66 qemu_chr_be_write
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x162bfe tcp_chr_read
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
b8d9d8c328fccab2bc69bfa6d17bcb2f46dfb163 0x4ff58
g_main_context_dispatch /lib64/libglib-2.0.so.0 -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b654 glib_pollfds_poll
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b7e0
os_host_main_loop_wait
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b934 main_loop_wait
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1e6a7e main_loop
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1ef642 main
/test/qemu/build/s390x-softmmu/qemu-system-s390x -

I suppose that asynchronous block IO processing completes after the
guest already did the virtqueue cleanup.
Both, machine check injection and device release processing are
triggered in the unplug callback virtio_ccw_busdev_unplug().
I suppose the device release processing (with its wait for
asynchronous processing to complete) happens too late.

What do you think?

This is interesting and definitely something that should never happen.

QEMU cannot rely on well-behaved guests, so we need to ensure that
hotplug works while guest I/O is in in-flight.

Does this reproducer trigger on qemu.git/master?  I'm concerned about
BlockDriverState refcounting and the virtio device model refcounts which
I think are sane in qemu.git/master but maybe not before 2.0.0.

Stefan


Yes Stefan, I reproduced it on the master branch as of April 11th.
Although I needed quite some time.

Heinz




reply via email to

[Prev in Thread] Current Thread [Next in Thread]