qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB


From: Xu, Anthony
Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB
Date: Tue, 14 Mar 2017 05:14:44 +0000

> > > Subpages never have subregions, so the loop never runs.  The
> begin/commit
> > > pair then becomes:
> > >
> > >     ++memory_region_transaction_depth;
> > >     --memory_region_transaction_depth;
> > >     if (!memory_region_transaction_depth) {
> > >         if (memory_region_update_pending) {
> > >             ...
> > >         } else if (ioeventfd_update_pending) {
> > >             ...
> > >         }
> > >         // memory_region_clear_pending()
> > >         memory_region_update_pending = false;
> > >         ioeventfd_update_pending = false;
> > >    }
> > >
> > > If memory_region_transaction_depth is > 0 the begin/commit pair does
> > > nothing.
> > >
> > > But if memory_region_transaction_depth is == 0, there should be no
> update
> > > pending because the loop has never run.  So I don't see what your patch
> can
> > > change.
> >
> > As I mentioned in PATCH1, this patch is used to fix an issue after we
> remove
> > the global lock in RCU callback. After global lock is removed, other thread
> > may set up update pending, so memory_region_transaction_commit
> > may try to rebuild PhysPageMap even the loop doesn’t run, other thread
> may
> > try to rebuild PhysPageMap at the same time, it is a race condition.
> > subpage MemoryRegion is a specific MemoryRegion, it doesn't belong to
> any
> > address space, it is only used to handle subpage. We may use a new
> structure
> > other than MemoryRegion to handle subpage to make the logic more
> clearer. After
> > the change, RCU callback will not free any MemoryRegion.
> 
> This is not true.  Try hot-unplugging a device.

You are right, hot-unplugging does cause memory region(not for subpage) freed 
in RCU thread.

I tried pci device hot plug/unplug,
When plug a device,
handle_hmp_command->qmp_device_add->qdev_device_add->
virtio_pci_dc_realize->pci_qdev_realize->virtio_pci_realize->
virtio_device_realize->virtio_bus_device_plugged->virtio_pci_device_plugged 
are called.
when unplug a device,
kvm_cpu_exec->memory_region_write_accessor->pci_write->
virtio_device_unrealize->virtio_pci_device_unplugged 
are called.

memory region addition and remove happen in virtio_pci_device_plugged 
and virtio_pci_device_unplugged respectively, memory region operation needs
 to acquire the global lock, but none of them happens in RCU thread.

When memory_region_finalize is called, the memory region has been removed 
from the address space (removed in virtio_pci_device_unplugged), both 
mr->subregions 
and mr->coalesced are empty. It makes sense to me, when memory_region_finalize 
is called, 
means this memory region is not used any more.
Please correct me if I'm wrong here, I only tried pci device hot plug/unplug.

If above assumption is correct, seems we don't need the global lock for memory 
region  
reclamation in RCU thread. Please let me know if other  memory reclamation in 
RCU thread 
need the global lock.

Under the assumption, I have below patch to remove the global lock in RCU 
thread, 
I tested vm boot ,reboot, shutdown and pci device hot plug/unplug.
Please review the patch, 
If no further issue, I will send out an official patch later.

Thanks,
Anthony


diff --git a/memory.c b/memory.c
index 6c58373..43e06e9 100644
--- a/memory.c
+++ b/memory.c
@@ -1503,15 +1503,9 @@ static void memory_region_finalize(Object *obj)
      * and cause an infinite loop.
      */
     mr->enabled = false;
-    memory_region_transaction_begin();
-    while (!QTAILQ_EMPTY(&mr->subregions)) {
-        MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
-        memory_region_del_subregion(mr, subregion);
-    }
-    memory_region_transaction_commit();
-
+    assert(QTAILQ_EMPTY(&mr->subregions));
     mr->destructor(mr);
-    memory_region_clear_coalescing(mr);
+    assert(QTAILQ_EMPTY(&mr->coalesced));
     g_free((char *)mr->name);
     g_free(mr->ioeventfds);
 }
diff --git a/util/rcu.c b/util/rcu.c
index 9adc5e4..51e0248 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -254,24 +254,20 @@ static void *call_rcu_thread(void *opaque)

         atomic_sub(&rcu_call_count, n);
         synchronize_rcu();
-        qemu_mutex_lock_iothread();
         while (n > 0) {
             node = try_dequeue();
             while (!node) {
-                qemu_mutex_unlock_iothread();
                 qemu_event_reset(&rcu_call_ready_event);
                 node = try_dequeue();
                 if (!node) {
                     qemu_event_wait(&rcu_call_ready_event);
                     node = try_dequeue();
                 }
-                qemu_mutex_lock_iothread();
             }

             n--;
             node->func(node);
         }
-        qemu_mutex_unlock_iothread();
     }
     abort();
 }





> 
> I'm all for reducing the scope of the global QEMU lock, 
Thanks,


>but this needs a plan
> and a careful analysis of the involved data structures across _all_
> instance_finalize
> implementations.
Agreed,

Below functions are registered in RCU thread
address_space_dispatch_free,
do_address_space_destroy
flatview_unref
reclaim_ramblock,
qht_map_destroy,
migration_bitmap_free

first three are address space related, should work without global lock per 
above analysis.
The rest are very simple, seems doesn't need global lock.




> 
> Paolo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]