qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 0/2] Calcuate downtime for postcopy live mig


From: Alexey Perevalov
Subject: Re: [Qemu-devel] [RFC PATCH 0/2] Calcuate downtime for postcopy live migration
Date: Wed, 05 Apr 2017 17:33:40 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 04/04/2017 10:06 PM, Dr. David Alan Gilbert wrote:
* Alexey Perevalov (address@hidden) wrote:
Hi David,

I already asked you about downtime calculation for postcopy live migration.
As I remember you said it's worth not to calculate it per vCPU or maybe I
understood you incorrectly. I decided to proof it could be useful.
Thanks - apologies for taking so long to look at it.
Some higher level thoughts:
    a) It needs to be switchable - the tree etc look like they could use a fair
       amount of RAM.
Are you worry about memory overhead when downtime will be calculated?
I chose tree due to lookup performance, but as an alternative it
could be a hash table. The tree population is on hot path, but the
final calculation is not. Do you want to enable it by demand,
as capability (such as compress)?
    b) The cpu bitmask is a problem given we can have more than 64 CPUs
Here I agree it's not so scalable, I thought about straightforward
char/bool array, or invent bit operation for memory region.

    c) Tracing the pages that took the longest can be interesting - I've done
       graphs of latencies before - you get fun things like watching messes
       where you lose requests and the page eventually arrives anyway after
       a few seconds.
This patch set is based on commit 272d7dee5951f926fad1911f2f072e5915cdcba0
of QEMU master branch. It requires commit into Andreas git repository
"userfaultfd: provide pid in userfault uffd_msg"

When I tested it I found following moments are strange:
1. First userfault always occurs due to access to ram in vapic_map_rom_writable,
all vCPU are sleeping in this time
That's probably not too surprising - I bet the vapic device load code does that?
I've sometimes wondered about preloading the queue on the source with some that 
we know
will need to be loaded early.
Yes, it's vapic device initialization, do your mean earlier than discard ram
blocks? I think vapic configuration on destination is the same as on source
machine, so maybe it's not necessary to request it and wait.


2. Latest half of all userfault was initiated by kworkers, that's why I had a 
doubt
about current in handle_userfault inside kernel as a proper task_struct for 
pagefault
initiator. All vCPU was sleeping at that moment.
When you say kworkers - which ones?  I wonder what they are - perhaps incoming 
network
packets using vhost?
No, in that scenario I used tap network device. Unfortunately, I didn't track down
who it was, but I will.

3. Also there is a discrepancy, of vCPU state and real vCPU thread state.
What do you mean by that?
I mean /proc's status and internal qemu's vCPU status for pagefaulted vCPU
thread.


This patch is just for showing and idea, if you ok with this idea none RFC 
patch will not
include proc access && a lot of traces.
Also I think it worth to guard postcopy_downtime in MigrationIncomingState and
return calculated downtime into src, where qeury-migration will be invocked.
I don't think it's worth it, we can always ask the destination and sending stuff
back to the source is probably messy - especially at the end.

Dave

Alexey Perevalov (2):
   userfault: add pid into uffd_msg
   migration: calculate downtime on dst side

  include/migration/migration.h     |  11 ++
  linux-headers/linux/userfaultfd.h |   1 +
  migration/migration.c             | 238 +++++++++++++++++++++++++++++++++++++-
  migration/postcopy-ram.c          |  61 +++++++++-
  migration/savevm.c                |   2 +
  migration/trace-events            |  10 +-
  6 files changed, 319 insertions(+), 4 deletions(-)

--
1.8.3.1

--
Dr. David Alan Gilbert / address@hidden / Manchester, UK





--
Best regards,
Alexey Perevalov



reply via email to

[Prev in Thread] Current Thread [Next in Thread]