[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host pag
From: |
David Gibson |
Subject: |
Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes |
Date: |
Thu, 14 Apr 2016 13:39:51 +1000 |
On Wed, 13 Apr 2016 16:15:25 +0300
"Michael S. Tsirkin" <address@hidden> wrote:
> On Wed, Apr 13, 2016 at 01:52:44PM +0200, Thomas Huth wrote:
> > The balloon code currently calls madvise() with TARGET_PAGE_SIZE
> > as length parameter, and an address which is directly based on
> > the page address supplied by the guest. Since the virtio-balloon
> > protocol is always based on 4k based addresses/sizes, no matter
> > what the host and guest are using as page size, this has a couple
> > of issues which could even lead to data loss in the worst case.
> >
> > TARGET_PAGE_SIZE might not be 4k, so it is wrong to use that
> > value for the madvise() call. If TARGET_PAGE_SIZE is bigger than
> > 4k, we also destroy the 4k areas after the current one - which
> > might be wrong since the guest did not want free that area yet (in
> > case the guest used as smaller MMU page size than the hard-coded
> > TARGET_PAGE_SIZE). So to fix this issue, introduce a proper define
> > called BALLOON_PAGE_SIZE (which is 4096) to use this as the size
> > parameter for the madvise() call instead.
>
> this makes absolute sense.
>
>
> > Then, there's yet another problem: If the host page size is bigger
> > than the 4k balloon page size, we can not simply call madvise() on
> > each of the 4k balloon addresses that we get from the guest - since
> > the madvise() always evicts the whole host page, not only a 4k area!
>
> Does it really round length up?
> Wow, it does:
> len = (len_in + ~PAGE_MASK) & PAGE_MASK;
>
> which seems to be undocumented, but has been there forever.
>
>
> > So in this case, we've got to track the 4k fragments of a host page
> > and only call madvise(DONTNEED) when all fragments have been collected.
> > This of course only works fine if the guest sends consecutive 4k
> > fragments - which is the case in the most important scenarios that
> > I try to address here (like a ppc64 guest with 64k page size that
> > is running on a ppc64 host with 64k page size). In case the guest
> > uses a page size that is smaller than the host page size, we might
> > need to add some more additional logic here later to increase the
> > probability of being able to release memory, but at least the guest
> > should now not crash anymore due to unintentionally evicted pages.
> >
> > Signed-off-by: Thomas Huth <address@hidden>
> > ---
> > I've tested this patch with both, a 4k page size ppc64 guest
> > and a 64k page size ppc64 guest on a 64k page size ppc64 host.
> > With this patch applied, I was not able to crash to crash the
> > guests anymore (the 4k guest easily crashes without this patch).
> > And looking at the output of the "free" command on the host,
> > the ballooning also still works as expected.
> >
> > hw/virtio/virtio-balloon.c | 68
> > ++++++++++++++++++++++++++++++++++----
> > include/hw/virtio/virtio-balloon.h | 3 ++
> > 2 files changed, 65 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index c74101e..886faa8 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -35,13 +35,56 @@
> > #include "hw/virtio/virtio-bus.h"
> > #include "hw/virtio/virtio-access.h"
> >
> > -static void balloon_page(void *addr, int deflate)
> > +#define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT)
> > +#define BALLOON_NO_CURRENT_PAGE ((void *)-1)
> > +
> > +static void balloon_page(VirtIOBalloon *s, void *addr, int deflate)
> > {
> > #if defined(__linux__)
> > - if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
> > - kvm_has_sync_mmu())) {
> > - qemu_madvise(addr, TARGET_PAGE_SIZE,
> > - deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> > + size_t host_page_size;
> > + void *aligned_addr;
> > +
> > + if (qemu_balloon_is_inhibited() || (kvm_enabled() &&
> > !kvm_has_sync_mmu())) {
> > + return;
> > + }
> > +
> > + host_page_size = getpagesize();
> > + if (host_page_size <= BALLOON_PAGE_SIZE) {
> > + qemu_madvise(addr, BALLOON_PAGE_SIZE,
> > + deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> > + return;
> > + }
> > +
> > + /* Host page size > ballon page size ==> use aligned host address */
> > + aligned_addr = (void *)((uintptr_t)addr & ~(host_page_size - 1));
> > + if (deflate) {
> > + /* MADV_WILLNEED is just a hint for the host kernel, the guest
> > should
> > + * also be able to use the memory again without this call, so let's
> > + * only do it for the first, aligned fragment of a host page and
> > + * ignore it for the others.
> > + */
> > + if (addr == aligned_addr) {
> > + qemu_madvise(aligned_addr, host_page_size, QEMU_MADV_WILLNEED);
> > + }
> > + s->current_addr = BALLOON_NO_CURRENT_PAGE;
> > + } else {
> > + const int max_frags = host_page_size / BALLOON_PAGE_SIZE;
> > + /* If we end up here, that means we want to evict balloon pages,
> > but
> > + * the host's page size is bigger than the 4k pages from the
> > balloon.
> > + * Since madvise() only works on the granularity of the host page
> > size,
> > + * we've got to collect all the 4k fragments from the guest first
> > + * before we can issue the MADV_DONTNEED call.
> > + */
> > + if (aligned_addr != s->current_addr) {
> > + memset(s->fragment_bits, 0, s->fragment_bits_size);
> > + s->current_addr = aligned_addr;
> > + }
> > + set_bit((addr - aligned_addr) / BALLOON_PAGE_SIZE,
> > s->fragment_bits);
> > + if (find_first_zero_bit(s->fragment_bits, max_frags) == max_frags)
> > {
> > + qemu_madvise(aligned_addr, host_page_size, QEMU_MADV_DONTNEED);
> > + memset(s->fragment_bits, 0, s->fragment_bits_size);
> > + s->current_addr = BALLOON_NO_CURRENT_PAGE;
> > + }
> > }
> > #endif
> > }
> > @@ -240,7 +283,7 @@ static void virtio_balloon_handle_output(VirtIODevice
> > *vdev, VirtQueue *vq)
> > /* Using memory_region_get_ram_ptr is bending the rules a bit,
> > but
> > should be OK because we only want a single page. */
> > addr = section.offset_within_region;
> > - balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
> > + balloon_page(s, memory_region_get_ram_ptr(section.mr) + addr,
> > !!(vq == s->dvq));
> > memory_region_unref(section.mr);
> > }
> > @@ -455,6 +498,13 @@ static void virtio_balloon_device_realize(DeviceState
> > *dev, Error **errp)
> > s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
> > s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> >
> > + if (getpagesize() > BALLOON_PAGE_SIZE) {
> > + s->fragment_bits_size = (getpagesize() / BALLOON_PAGE_SIZE
> > + + sizeof(long) * 8 - 1) / 8;
> > + s->fragment_bits = g_malloc0(s->fragment_bits_size);
> > + s->current_addr = BALLOON_NO_CURRENT_PAGE;
> > + }
> > +
> > reset_stats(s);
> >
> > register_savevm(dev, "virtio-balloon", -1, 1,
> > @@ -466,6 +516,7 @@ static void virtio_balloon_device_unrealize(DeviceState
> > *dev, Error **errp)
> > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> >
> > + g_free(s->fragment_bits);
> > balloon_stats_destroy_timer(s);
> > qemu_remove_balloon_handler(s);
> > unregister_savevm(dev, "virtio-balloon", s);
> > @@ -480,6 +531,11 @@ static void virtio_balloon_device_reset(VirtIODevice
> > *vdev)
> > g_free(s->stats_vq_elem);
> > s->stats_vq_elem = NULL;
> > }
> > +
> > + if (s->fragment_bits) {
> > + memset(s->fragment_bits, 0, s->fragment_bits_size);
> > + s->current_addr = BALLOON_NO_CURRENT_PAGE;
> > + }
> > }
> >
> > static void virtio_balloon_instance_init(Object *obj)
> > diff --git a/include/hw/virtio/virtio-balloon.h
> > b/include/hw/virtio/virtio-balloon.h
> > index 35f62ac..04b7c0c 100644
> > --- a/include/hw/virtio/virtio-balloon.h
> > +++ b/include/hw/virtio/virtio-balloon.h
> > @@ -43,6 +43,9 @@ typedef struct VirtIOBalloon {
> > int64_t stats_last_update;
> > int64_t stats_poll_interval;
> > uint32_t host_features;
> > + void *current_addr;
> > + unsigned long *fragment_bits;
> > + int fragment_bits_size;
> > } VirtIOBalloon;
> >
> > #endif
> > --
> > 1.8.3.1
>
>
> It looks like fragment_bits would have to be migrated.
Hmm.. do they? If we just ignore migration, isn't the worse that
happens that we just keep one extra page allocated.
> Which is a lot of complexity.
> And work arounds for specific guest behaviour are really ugly.
> There are patches on-list to maintain a balloon bitmap -
> that will enable fixing it cleanly.
> How about we just skip madvise if host page size is > balloon
> page size, for 2.6?
>
> --
> MST
--
David Gibson <address@hidden>
Senior Software Engineer, Virtualization, Red Hat
pgpnjeJ5QwwXl.pgp
Description: OpenPGP digital signature
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, (continued)
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, Thomas Huth, 2016/04/13
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, Michael S. Tsirkin, 2016/04/13
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, Thomas Huth, 2016/04/13
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, Michael S. Tsirkin, 2016/04/13
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, David Gibson, 2016/04/13
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, Andrew Jones, 2016/04/13
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, Dr. David Alan Gilbert, 2016/04/14
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, Thomas Huth, 2016/04/14
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, Dr. David Alan Gilbert, 2016/04/14
- Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, David Gibson, 2016/04/15
Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes,
David Gibson <=
Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes, David Gibson, 2016/04/13