qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] RFC: Partial workaround for buggy guest virtio-balloon driv


From: David Gibson
Subject: [Qemu-devel] RFC: Partial workaround for buggy guest virtio-balloon driver
Date: Thu, 8 Nov 2012 15:45:22 +1100
User-agent: Mutt/1.5.21 (2010-09-15)

Linux kernel commits 1a87228f5f1d316002c7c161316f5524592be766
"virtio_balloon: Fix endian bug" and
3ccc9372ed0fab33d20f10be3c1efd5776ff5913 "virtio_balloon: fix handling
of PAGE_SIZE != 4k" fixed two serious bugs in their (guest side)
handling of the virtio balloon.  In practice, these bugs only affected
powerpc guests, which is big-endian and frequently configured for 64k
base page size.  Attempting to use the balloon with the buggy guest
would usually result in an immediate guest crash.

The bugs are fixed now, of course and the balloon works fine with
kernels v3.4 and later, but unfortunately there are many distro
releases still in use which still have buggy kernels.

The nature of the page size bug makes it impossible to work around
from the host side (there simply isn't enough information supplied to
operate the balloon correctly).  However, it *is* possible with some
fiddling to safely detect the endian bug in the guest kernel, and
disable the balloon in this case.  The two fixes were applied to the
mainline kernel almost consecutively, so there are no released kernels
with one fix but not the other, meaning we can use the presence of the
endian bug as a proxy for the presence of the page size bug.

This patch implements such a test, working as follows.

For a fixed guest kernel:
  1. qemu sets a state variable to "TESTING" and the initial balloon
     target size to 16 (4k pages).
  2. When the guest balloon driver starts, it sees the target, finds
     either 16 unused 4k pages or 1 unused 64k page (depending on
     guest config) and submits the 16 resulting 4k pfns to the
     host. qemu, in TESTING state, ignores the submitted pages for
     now.
  4. The guest then updates the 'actual' field in the balloon config
     space to 16.  qemu sees this and determines that the guest is not
     buggy, it moves to CLEANUP state, and sets the target balloon
     size back to 0.
  5. The guest sees the target go back to 0, and reclaims its page(s)
     from the balloon.  qemu continues to ignore the page addresses
     for now in CLEANUP state.
  6. The guest updates the actual field to 0.  qemu now considers
     cleanup complete and moves to GOOD state.  The balloon now
     operates normally.

For a buggy kernel:
  1. qemu sets a state variable to "TESTING" and the initial balloon
     target size to 16 (4k pages).
  2. When the guest balloon driver starts it sees the non-zero target,
     and misinterprets it as 268435456 (endian bug).  It starts trying
     to find pages to free.
  3. The guest is probably newly booted, so it almost certainly finds
     256 pages easily and submits incorrect addresses for them (page
     size bug) to the host. qemu, in TESTING state ignores the (wrong)
     addresses for now.
  4. The guest updates the actual field in config space to 256.  qemu
     sees this, and determines that the guest is buggy.  It moves to
     BUGGY state and sets the balloon target back to zero.
  5. The guest, before attempting to find the next batch of pages to
     free, rechecks the target and discovers it is now zero.  It
     reclaims the pages by submitting more wrong addresses, which qemu
     ignores.
  6. The balloon is now disabled, if the user attempts to change the
     balloon size, qemu print an error message and otherwise ignore
     it.


I'm aware that this patch needs a bunch more comments (largely based
on the info above), but otherwise do people think this is a reasonable
approach.  It doesn't (and can't) fix the balloon for buggy kernels,
but it does make the failure mode a lot less ugly.

>From dbc721f5e50a39ca3b40d81f060d8bb0e6312995 Mon Sep 17 00:00:00 2001
From: David Gibson <address@hidden>
Date: Thu, 8 Nov 2012 14:49:38 +1100
Subject: [PATCH] Detection of buggy guest balloon

---
 hw/virtio-balloon.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 72 insertions(+), 5 deletions(-)

diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index dd1a650..6a9cd3f 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -37,8 +37,16 @@ typedef struct VirtIOBalloon
     VirtQueueElement stats_vq_elem;
     size_t stats_vq_offset;
     DeviceState *qdev;
+
+    int guest_bug_state;
+#define GUEST_BUG_TESTING       0
+#define GUEST_BUG_CLEANUP       1
+#define GUEST_BUG_BUGGY         2
+#define GUEST_BUG_GOOD          3
 } VirtIOBalloon;
 
+#define GUEST_BUG_TARGET        16
+
 static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
 {
     return (VirtIOBalloon *)vdev;
@@ -84,6 +92,11 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, 
VirtQueue *vq)
             pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT;
             offset += 4;
 
+            if (s->guest_bug_state != GUEST_BUG_GOOD) {
+                /* Still bug testing, not ready to use balloon yet */
+                continue;
+            }
+
             /* FIXME: remove get_system_memory(), but how? */
             section = memory_region_find(get_system_memory(), pa, 1);
             if (!section.size || !memory_region_is_ram(section.mr))
@@ -134,8 +147,23 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 {
     VirtIOBalloon *dev = to_virtio_balloon(vdev);
     struct virtio_balloon_config config;
+    uint32_t num_pages;
+
+    switch (dev->guest_bug_state) {
+    case GUEST_BUG_TESTING:
+        num_pages = GUEST_BUG_TARGET;
+        break;
 
-    config.num_pages = cpu_to_le32(dev->num_pages);
+    case GUEST_BUG_CLEANUP:
+    case GUEST_BUG_BUGGY:
+        num_pages = 0;
+        break;
+
+    default:
+        num_pages = dev->num_pages;
+    }
+
+    config.num_pages = cpu_to_le32(num_pages);
     config.actual = cpu_to_le32(dev->actual);
 
     memcpy(config_data, &config, 8);
@@ -147,11 +175,41 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
     VirtIOBalloon *dev = to_virtio_balloon(vdev);
     struct virtio_balloon_config config;
     uint32_t oldactual = dev->actual;
+
     memcpy(&config, config_data, 8);
     dev->actual = le32_to_cpu(config.actual);
-    if (dev->actual != oldactual) {
-        qemu_balloon_changed(ram_size -
-                             (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
+
+    switch (dev->guest_bug_state) {
+    case GUEST_BUG_TESTING:
+        if (dev->actual == 0) {
+            /* Both buggy and non-buggy guests write a 0 before going
+             * on to write a meaningful value */
+            break;
+        }
+
+        if (dev->actual > GUEST_BUG_TARGET) {
+            fprintf(stderr, "virtio-balloon: Buggy guest detected, disabling 
balloon\n");
+            dev->guest_bug_state = GUEST_BUG_BUGGY;
+        } else {
+            dev->guest_bug_state = GUEST_BUG_CLEANUP;
+        }
+        /* Changing bug state implicitly alters the config */
+        virtio_notify_config(&dev->vdev);
+        break;
+
+    case GUEST_BUG_CLEANUP:
+        if (dev->actual == 0) {
+            /* Cleanup completed, proceed with normal operation */
+            dev->guest_bug_state = GUEST_BUG_GOOD;
+            virtio_notify_config(&dev->vdev);
+        }
+        break;
+
+    default:
+        if (dev->actual != oldactual) {
+            qemu_balloon_changed(ram_size -
+                                 (dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
+        }
     }
 }
 
@@ -194,12 +252,21 @@ static void virtio_balloon_to_target(void *opaque, 
ram_addr_t target)
 {
     VirtIOBalloon *dev = opaque;
 
+    if (dev->guest_bug_state == GUEST_BUG_BUGGY) {
+        fprintf(stderr, "Guest is buggy, cannot use balloon\n");
+    }
+
     if (target > ram_size) {
         target = ram_size;
     }
     if (target) {
         dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
-        virtio_notify_config(&dev->vdev);
+
+        /* If we're still testing for guest bugs, delay the change
+         * interrupt until we've finished that */
+        if (dev->guest_bug_state == GUEST_BUG_GOOD) {
+            virtio_notify_config(&dev->vdev);
+        }
     }
 }
 
-- 
1.7.10.4



-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson



reply via email to

[Prev in Thread] Current Thread [Next in Thread]