qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] drive_del vs. device_del: what should come first?


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] drive_del vs. device_del: what should come first?
Date: Wed, 23 Apr 2014 15:01:57 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Apr 11, 2014 at 02:47:20PM +0200, Heinz Graalfs wrote:
> Hello Markus,
> 
> I finally managed to reproduce the problem, at least once ...
> 
> The scenario was:
> dd if=/dev/vdx1 of=partitionone
> 
> followed by a virsh detach... (with the device_del() under the cover)
> during active dd processing
> 
> dmesg shows:
> 
> [79026.220718] User process fault: interruption code 0x40009 in
> qemu-system-s390x[80000000+43c000]
> [79026.220737] CPU: 3 PID: 20117 Comm: qemu-system-s39 Not tainted
> 3.10.33+ #4
> [79026.220742] task: 0000000085930000 ti: 00000000e0b74000 task.ti:
> 00000000e0b74000
> [79026.220746] User PSW : 0705000180000000 0000000080269722 (0x80269722)
> [79026.220774]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0
> CC:0 PM:0 EA:3
> User GPRS: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [79026.220827]            0000000000000000 0000000000000000
> 00000000803761dc 00000000809001d0
> [79026.220831]            000003fffd0391a8 00000000808ff560
> 00000000808eb020 000003ffffc03838
> [79026.220834]            0000000000000000 0000000080375e88
> 00000000802696f6 000003ffffc03838
> [79026.220847] User Code: 0000000080269716: b9040034          lgr     %r3,%r4
>            000000008026971a: a7290000         lghi    %r2,0
>           #000000008026971e: b9870021         dlgr    %r2,%r1
>           >0000000080269722: b9040012         lgr     %r1,%r2
>            0000000080269726: 5010b0a0         st      %r1,160(%r11)
>            000000008026972a: 5820b0a0         l       %r2,160(%r11)
>            000000008026972e: e310b0a80004     lg      %r1,168(%r11)
>            0000000080269734: 58101000         l       %r1,0(%r1)
> [79026.220875] Last Breaking-Event-Address:
> [79026.220878]  [<000000008026904c>] 0x8026904c
> 
> with
> PSW addr: 0000000080269722 pointing into virtio_queue_empty
> 
> ...
> 000000008026a67c t virtio_notify_vector
> 00000000802694a4 T virtio_queue_empty
> 000000008026b440 T virtio_queue_get_addr
> ...
> 
> and coming from:
> R14   00000000802696f6 virtqueue_fill
> ...
> 0000000080269f74 T virtqueue_avail_bytes
> 0000000080269540 T virtqueue_fill
> 000000008026979c T virtqueue_flush
> 0000000080269c2c T virtqueue_get_avail_bytes
> ...
> 
> see below the complete core_backtrace ...
> 
> On 02/04/14 19:40, Markus Armbruster wrote:
> >Heinz Graalfs <address@hidden> writes:
> >
> >>On 01/04/14 17:48, Markus Armbruster wrote:
> >>>Heinz Graalfs <address@hidden> writes:
> >>>
> >>>>Hi Kevin,
> >>>>
> >>>>doing a
> >>>>
> >>>>       virsh detach-device ...
> >>>>
> >>>>ends up in the following QEMU monitor commands:
> >>>>
> >>>>1. device_del ...
> >>>>2. drive_del ...
> >>>>
> >>>>qmp_device_del() performs the device unplug path.
> >>>>In case of a block device do_drive_del() tries to
> >>>>prevent further IO against the host device.
> >>>>
> >>>>However, bdrv_find() during drive_del() results in
> >>>>an error, because the device is already gone. Due to
> >>>>this error all the bdrv_xxx calls to quiesce the block
> >>>>driver as well as all other processing is skipped.
> >>>>
> >>>>Is the sequence that libvirt triggers OK?
> >>>>Shouldn't drive_del be executed first?
> >>>
> >>>No.
> >>
> >>OK, I see. The drive is deleted implicitly (release_drive()).
> >>Doing a device_del() requires another drive_add() AND device_add().
> >
> >Yes.  The automatic drive delete on unplug is a wart that will not be
> >preserved in the new interfaces we're building now.
> >
> >>(Doing just a device_add() complains about the missing drive.
> >>A subsequent info qtree lets QEMU abort.)
> >
> >Really?  Got a reproducer for me?
> >
> >>>drive_del is nasty.  Its purpose is to revoke access to an image even
> >>>when the guest refuses to cooperate.  To the guest, this looks like
> >>>hardware failure.
> >>
> >>Deleting a device during active IO is nasty and it should look like a
> >>hardware failure. I would expect lots of errors.
> >>
> >>>
> >>>If you drive_del before device_del, even a perfectly well-behaved guest
> >>>guest is exposed to a terminally broken device between drive_del and
> >>>completion of unplug.
> >>
> >>The early drive_del() would mean that no further IO would be
> >>possible.
> >
> >A polite way to describe the effect of drive_del on the guest.
> >
> >drive_del makes all attempts at actual I/O error out without any
> >forewarning whatsoever.  If you do that to your guest, and something
> >breaks, you get to keep the pieces.  Even if you "only" do it for a
> >short period of time.
> >
> >>>Always try a device_del first, and only if that does not complete within
> >>>reasonable time, and you absolutely must revoke access to the image,
> >>>then whack it over the head with drive_del.
> >>
> >>What is this reasonable time?
> >
> >Depends on how heavily loaded host and guest are.  Suggest to measure
> >with a suitably thrashing guest, then shift left to taste.
> >
> >>On 390 we experience problems (QEMU abort) when asynch block IO
> >>completes and the virtqueues are already gone. I suppose the
> >>bdrv_drain_all() in bdrv_close() is a little late. I don't see such
> >>problems with an early bdrv_drain_all() (drive_del) and an unplug
> >>(device_del) afterwards.
> >
> >Sounds like a bug.  Got a reproducer?
> >
> 
> 
> here is the complete core_backtrace:
> 
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269722 virtqueue_fill
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269924 virtqueue_push
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a400
> virtio_blk_req_complete
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a6a4
> virtio_blk_rw_complete
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x34686 bdrv_co_em_bh
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xcd40 aio_bh_poll
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xc7f0 aio_poll
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2ac9c bdrv_drain_all
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2a718 bdrv_close
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2b97e bdrv_delete
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x36db4 bdrv_unref
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa02ea drive_uninit
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa0448 drive_put_ref
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x9fae6 blockdev_auto_del
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd0bf0 release_drive
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c158
> object_property_del_all
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c4e4 object_finalize
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18d6d2 object_unref
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c382 object_unparent
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x25ca14
> virtio_ccw_busdev_unplug
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd6c88 qdev_unplug
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15c5f0 qmp_device_del
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x182b92
> qmp_marshal_input_device_del
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fa36 qmp_call_cmd
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fd2c handle_qmp_command
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2fff3c
> json_message_process_token
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32adfe
> json_lexer_feed_char
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32b08c json_lexer_feed
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x3000ee
> json_message_parser_feed
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fe4c
> monitor_control_read
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15cd66 qemu_chr_be_write
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x162bfe tcp_chr_read
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> b8d9d8c328fccab2bc69bfa6d17bcb2f46dfb163 0x4ff58
> g_main_context_dispatch /lib64/libglib-2.0.so.0 -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b654 glib_pollfds_poll
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b7e0
> os_host_main_loop_wait
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b934 main_loop_wait
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1e6a7e main_loop
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1ef642 main
> /test/qemu/build/s390x-softmmu/qemu-system-s390x -
> 
> I suppose that asynchronous block IO processing completes after the
> guest already did the virtqueue cleanup.
> Both, machine check injection and device release processing are
> triggered in the unplug callback virtio_ccw_busdev_unplug().
> I suppose the device release processing (with its wait for
> asynchronous processing to complete) happens too late.
> 
> What do you think?

This is interesting and definitely something that should never happen.

QEMU cannot rely on well-behaved guests, so we need to ensure that
hotplug works while guest I/O is in in-flight.

Does this reproducer trigger on qemu.git/master?  I'm concerned about
BlockDriverState refcounting and the virtio device model refcounts which
I think are sane in qemu.git/master but maybe not before 2.0.0.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]