qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH RFC] blkdebug: make the fault injection functionalit


From: Hitoshi Mitake
Subject: [Qemu-devel] [PATCH RFC] blkdebug: make the fault injection functionality callable from QMP
Date: Wed, 13 Aug 2014 13:48:41 +0900

This patch makes the fault injection functionality of blkdebug
callable from QMP. Motivation of this change is for testing and
debugging distributed systems. Ordinal distributed systems must handle
hardware faults because of its reason for existence, but testing
whether the systems can hanle such faults and recover in a correct
manner is really hard.

Typically, developers of distributed systems check such recovery paths
with unit test or artificial environment which can be built in a
single box. But such tests can miss important attributes of real world
hardware faults. Examples of disk drive:
 - write(2) doesn't return -1 immediately in a case of disk error even
   a target file is opened with O_SYNC, if file system of the file is
   not mounted with barrier option
 - some disks become silent suddenly without causing errors, so
   applications must handle such a case with fine tuned timeout of
   disk I/O
 - some disks can cause performance degradation instead of stopping
   and causing errors [1]

For testing recovery paths and configuration of distributed systems,
mocking faults like the above examples in virtual devices is
effective. Because ordinal testing techniques which target errors of
library APIs and systemcalls cannot mock the above faults. In
addition, injecting faults at the level of virtual devices can test
whole stack of target systems (from device drivers to
applications). As a first step of implementing this testing technique,
this patch implements simple error injection for block devices via QMP
with using existing blkdebug. I think it is more useful for testing
distributed systems than existing rule based fault injection of
blkdebug. Because users can inject faults at any time.

With this feature, I could find a potential problem in the deployment
guide of OpenStack Swift [2]. In the guide, nobarrier option of xfs is
suggested without any caution. The option degrades durability of Swift
cluster because it delays detection of disk error. In addition, the
option is not suggested in a book of Swift guide [3]. So I concluded
the guide [2] can lead to a misconfiguration of Swift. I believe this
sort of problem can be found in other systems so the feature is useful
for developers and admins of distributed systems.

Of course I understand that testing distributed systems is not a
primary purpose of QEMU. If making blkdebug bloated with this feature
is not acceptable for QEMU community, I'll create a fork for this
purpose.

Example of launching QEMU with this feature:

sudo x86_64-softmmu/qemu-system-x86_64 -qmp \
tcp:localhost:4444,server,nowait -enable-kvm -hda \
blkdebug:/dev/null:/tmp/debian.qcow2

(/dev/null is needed because blkdebug requires configuration file, but
for QMP purpose empty file is enough)

Example of QMP sequence (via telnet localhost 4444):

{ "execute": "qmp_capabilities" }
{"return": {}}

{"execute":"query-block"}
{"return": [{"io-status": "ok", "device": "ide0-hd0", "locked": false,
"removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off",
"image": {"virtual-size": 10737418240, "filename": "", "format":
"raw", "actual-size": 1704513536}, "iops_wr": 0, "ro": false,
"backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0,
"encrypted": false, "bps": 0, "bps_rd": 0, "file": "",
"encryption_key_missing": false}, "type": "unknown"}, {"io-status":
"ok", "device": "ide1-cd0", "locked": false, "removable": true,
"tray_open": false, "type": "unknown"}, {"device": "floppy0",
"locked": false, "removable": true, "tray_open": false, "type":
"unknown"}, {"device": "sd0", "locked": false, "removable": true,
"tray_open": false, "type": "unknown"}]}

{"execute": "set-block-fault-state", "arguments": {"id": "ide0-hd0",
"state": "error"}} # <- inject error to /dev/sda
{"return": {}}

Now the guest OS on the VM finds the disk is broken.

Of course, using QMP directly is painful for users (developers and
admins of distributed systems). I'm implementing user friendly
interface in vagrant-kvm [4] for blackbox testing. In addition, a
testing framework for injecting faults at critical timing (which
requires solid understanding of target systems) is in progress.

[1] http://ucare.cs.uchicago.edu/pdf/socc13-limplock.pdf
[2] http://docs.openstack.org/developer/swift/howto_installmultinode.html
[3] http://www.amazon.com/dp/B00C93QFHI
[4] https://github.com/adrahon/vagrant-kvm

Cc: Kevin Wolf <address@hidden>
Cc: Stefan Hajnoczi <address@hidden>
Signed-off-by: Hitoshi Mitake <address@hidden>
---
 block/blkdebug.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 qapi-schema.json | 39 +++++++++++++++++++++++++++
 qmp-commands.hx  | 24 +++++++++++++++++
 3 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index f51407d..66d6bb1 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -26,6 +26,7 @@
 #include "qemu/config-file.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qmp-commands.h"
 
 typedef struct BDRVBlkdebugState {
     int state;
@@ -34,6 +35,8 @@ typedef struct BDRVBlkdebugState {
     QLIST_HEAD(, BlkdebugRule) rules[BLKDBG_EVENT_MAX];
     QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
     QLIST_HEAD(, BlkdebugSuspendedReq) suspended_reqs;
+
+    BlockFaultState qmp_fault_state;
 } BDRVBlkdebugState;
 
 typedef struct BlkdebugAIOCB {
@@ -456,16 +459,21 @@ static BlockDriverAIOCB *inject_error(BlockDriverState 
*bs,
     BlockDriverCompletionFunc *cb, void *opaque, BlkdebugRule *rule)
 {
     BDRVBlkdebugState *s = bs->opaque;
-    int error = rule->options.inject.error;
+    int error = 0;
     struct BlkdebugAIOCB *acb;
     QEMUBH *bh;
 
-    if (rule->options.inject.once) {
-        QSIMPLEQ_INIT(&s->active_rules);
-    }
+    if (s->qmp_fault_state != BLOCK_FAULT_STATE_NONE) {
+        error = EIO;
+    } else if (rule) {
+        error = rule->options.inject.error;
+        if (rule->options.inject.once) {
+            QSIMPLEQ_INIT(&s->active_rules);
+        }
 
-    if (rule->options.inject.immediately) {
-        return NULL;
+        if (rule->options.inject.immediately) {
+            return NULL;
+        }
     }
 
     acb = qemu_aio_get(&blkdebug_aiocb_info, bs, cb, opaque);
@@ -485,6 +493,10 @@ static BlockDriverAIOCB 
*blkdebug_aio_readv(BlockDriverState *bs,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;
 
+    if (s->qmp_fault_state != BLOCK_FAULT_STATE_NONE) {
+        return inject_error(bs, cb, opaque, rule);
+    }
+
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         if (rule->options.inject.sector == -1 ||
             (rule->options.inject.sector >= sector_num &&
@@ -507,6 +519,10 @@ static BlockDriverAIOCB 
*blkdebug_aio_writev(BlockDriverState *bs,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;
 
+    if (s->qmp_fault_state != BLOCK_FAULT_STATE_NONE) {
+        return inject_error(bs, cb, opaque, rule);
+    }
+
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         if (rule->options.inject.sector == -1 ||
             (rule->options.inject.sector >= sector_num &&
@@ -687,6 +703,56 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
     return bdrv_getlength(bs->file);
 }
 
+BlockFaultState qmp_query_block_fault_state(const char *id, Error **errp)
+{
+    BlockDriverState *bs;
+    BDRVBlkdebugState *s;
+
+    bs = bdrv_find(id);
+    if (!bs) {
+        error_setg(errp, "BlockDriver (%s) doesn't exist", id);
+        return BLOCK_FAULT_STATE_NONE;
+    }
+    bs = bs->file;
+
+    s = bs->opaque;
+    return s->qmp_fault_state;
+}
+
+void qmp_set_block_fault_state(const char *id, BlockFaultState state,
+    Error **errp)
+{
+    BlockDriverState *bs;
+    BDRVBlkdebugState *s;
+
+    bs = bdrv_find(id);
+    if (!bs) {
+        error_setg(errp, "BlockDriverState (%s) doesn't exist", id);
+        return;
+    }
+
+    bs = bs->file;
+    if (strcmp(bs->drv->format_name, "blkdebug")) {
+        error_setg(errp, "BlockDriver (%s) isn't blkdebug",
+                   bs->drv->format_name);
+        return;
+    }
+
+    s = bs->opaque;
+    s->qmp_fault_state = state;
+}
+
+static int blkdebug_co_flush_to_disk(BlockDriverState *bs)
+{
+    BDRVBlkdebugState *s = bs->opaque;
+
+    if (s->qmp_fault_state == BLOCK_FAULT_STATE_NONE) {
+        return bdrv_co_flush(bs->file);
+    }
+
+    return -EIO;
+}
+
 static BlockDriver bdrv_blkdebug = {
     .format_name            = "blkdebug",
     .protocol_name          = "blkdebug",
@@ -706,6 +772,8 @@ static BlockDriver bdrv_blkdebug = {
                                 = blkdebug_debug_remove_breakpoint,
     .bdrv_debug_resume          = blkdebug_debug_resume,
     .bdrv_debug_is_suspended    = blkdebug_debug_is_suspended,
+
+    .bdrv_co_flush_to_disk  = blkdebug_co_flush_to_disk,
 };
 
 static void bdrv_blkdebug_init(void)
diff --git a/qapi-schema.json b/qapi-schema.json
index 341f417..6781642 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3481,3 +3481,42 @@
 # Since: 2.1
 ##
 { 'command': 'rtc-reset-reinjection' }
+
+##
+# @BlockFaultState:
+#
+# Injected fault state for a block device.
+#
+# @none: no fault
+#
+# @error: visible error
+#
+{ 'enum': 'BlockFaultState', 'data': [ 'none', 'error' ] }
+
+##
+# @query-block-fault-state:
+#
+# Return block fault state information for the given block device.
+#
+# @name: name of block device
+#
+# Returns: @BlockFaultState of a block device.
+#
+# Since: 1.6
+##
+{ 'command': 'query-block-fault-state', 'data': { 'id': 'str' },
+  'returns': 'BlockFaultState' }
+##
+# @set-block-fault-state:
+#
+# Set fault state for the given block device.
+#
+# @name: the chardev's ID, must exist and not be in use
+# @state: fault state
+#
+# Returns: Nothing on success
+#
+# Since: 1.6
+##
+{ 'command': 'set-block-fault-state', 'data': {'id': 'str',
+  'state': 'BlockFaultState' }}
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4be4765..96ae7e2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3755,3 +3755,27 @@ Example:
 <- { "return": {} }
 
 EQMP
+    {
+        .name       = "query-block-fault-state",
+        .args_type  = "id:s",
+        .mhandler.cmd_new = qmp_marshal_input_query_block_fault_state,
+    },
+
+SQMP
+query-block-fault-state
+-----------------------
+
+Show block fault information.
+
+EQMP
+    {
+        .name       = "set-block-fault-state",
+        .args_type  = "id:s,state:q",
+        .mhandler.cmd_new = qmp_marshal_input_set_block_fault_state,
+    },
+
+SQMP
+set-block-fault-state
+-----------------------
+
+Set block fault information.
-- 
1.8.3.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]