On Wed, May 24, 2017 at 03:53:05PM +0800, Peter Xu wrote:
On Tue, May 23, 2017 at 02:31:09PM +0300, Alexey Perevalov wrote:
This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.
This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)
Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.
Signed-off-by: Alexey Perevalov <address@hidden>
---
migration/postcopy-ram.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++-
migration/trace-events | 5 ++-
2 files changed, 105 insertions(+), 2 deletions(-)
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d647769..e70c44b 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -23,6 +23,7 @@
#include "postcopy-ram.h"
#include "sysemu/sysemu.h"
#include "sysemu/balloon.h"
+#include <sys/param.h>
#include "qemu/error-report.h"
#include "trace.h"
@@ -577,6 +578,101 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
return 0;
}
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+ CPUState *cpu_iter;
+
+ CPU_FOREACH(cpu_iter) {
+ if (cpu_iter->thread_id == pid) {
+ return cpu_iter->cpu_index;
+ }
+ }
+ trace_get_mem_fault_cpu_index(pid);
+ return -1;
+}
+
+static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
+ RAMBlock *rb)
+{
+ int cpu;
+ unsigned long int nr_bit;
+ MigrationIncomingState *mis = migration_incoming_get_current();
+ PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+ int64_t now_ms;
+
+ if (!dc || ptid == 0) {
+ return;
+ }
+ cpu = get_mem_fault_cpu_index(ptid);
+ if (cpu < 0) {
+ return;
+ }
+ nr_bit = get_copied_bit_offset(addr);
+ if (test_bit(nr_bit, mis->copied_pages)) {
+ return;
+ }
+ now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+ if (dc->vcpu_addr[cpu] == 0) {
+ atomic_inc(&dc->smp_cpus_down);
+ }
+
+ atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
+ atomic_xchg__nocheck(&dc->last_begin, now_ms);
+ atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
Looks like this is not what you and Dave have discussed?
(Btw, sorry to have not followed the thread recently, so I just went
over the discussion again...)
What I see that Dave suggested is (I copied from Dave's email):
blocktime_start:
set CPU stall address
check bitmap entry
if set then zero stall-address
While here it is:
blocktime_start:
check bitmap entry
if set then return
set CPU stall address
I don't think current version can really solve the risk condition. See
this possible sequence:
receive-thread fault-thread
-------------- ------------
blocktime_start
check bitmap entry,
if set then return
blocktime_end
set bitmap entry
read CPU stall address,
if none-0 then zero it
set CPU stall address [1]
Then imho the address set at [1] will be stall again until forever.
agree, I check is in incorrect order
I think we should follow exactly what Dave has suggested.
And.. after a second thought, I am afraid even this would not satisfy
all risk conditions. What if we consider the UFFDIO_COPY ioctl in?
AFAIU after UFFDIO_COPY the faulted vcpu can be running again, then
the question is, can it quickly trigger another page fault?
yes, it can
Firstly, a workable sequence is (adding UFFDIO_COPY ioctl in, and
showing vcpu-thread X as well):
receive-thread fault-thread vcpu-thread X
-------------- ------------ -------------
fault at addr A1
fault_addr[X]=A1
UFFDIO_COPY page A1
check fault_addr[X] with A1
if match, clear fault_addr[X]
vcpu X starts
This is fine.
While since "vcpu X starts" can be right after UFFDIO_COPY, can this
be possible?
Previous picture isn't possible, due to mark_postcopy_blocktime_end
is being called right after ioctl, and vCPU is waking up
inside ioctl, so check fault_addr will be after vcpu X starts.
receive-thread fault-thread vcpu-thread X
-------------- ------------ -------------
fault at addr A1
fault_addr[X]=A1
UFFDIO_COPY page A1
vcpu X starts
fault at addr A2
fault_addr[X]=A2
check fault_addr[X] with A1
if match, clear fault_addr[X]
^
|
+---------- here it will not match since now fault_addr[X]==A2
Then looks like fault_addr[X], which is currently A1, will stall
again?
It will be another address(A2), but probably the same vCPU and if in
this case blocktime_start will be called before blocktime_end we lost
block time for page A1. Address of the page is unique key in this
process, not vCPU ;)