qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] RFC migration of zero pages


From: Peter Lieven
Subject: Re: [Qemu-devel] RFC migration of zero pages
Date: Thu, 31 Jan 2013 12:53:12 +0100

RFC patch is attached. Comments appreciated.
I have two concerns left:
a) what happens if a page turns from zero to non-zero in the first stage. Is
this page transferred in the same round or in the next?
b) what happens if live migration fails or is aborted and then again
a migration is started to the same target (if this is possible). Is the
memory at the target reinitialized?

Am 31.01.2013 um 10:37 schrieb Orit Wasserman <address@hidden>:

> On 01/31/2013 11:25 AM, Peter Lieven wrote:
>> 
>> Am 31.01.2013 um 10:19 schrieb Orit Wasserman <address@hidden>:
>> 
>>> On 01/31/2013 11:00 AM, Peter Lieven wrote:
>>>> 
>>>> Am 31.01.2013 um 09:59 schrieb Orit Wasserman <address@hidden>:
>>>> 
>>>>> On 01/31/2013 10:37 AM, Peter Lieven wrote:
>>>>>> 
>>>>>> Am 31.01.2013 um 09:33 schrieb Orit Wasserman <address@hidden>:
>>>>>> 
>>>>>>> On 01/31/2013 10:10 AM, Peter Lieven wrote:
>>>>>>>> 
>>>>>>>> Am 31.01.2013 um 08:47 schrieb Orit Wasserman <address@hidden>:
>>>>>>>> 
>>>>>>>>> On 01/31/2013 08:57 AM, Peter Lieven wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I just came across an idea and would like to have feedback if it 
>>>>>>>>>> makes sence or not.
>>>>>>>>>> 
>>>>>>>>>> If a VM is started without preallocated memory all memory that has 
>>>>>>>>>> not been written to
>>>>>>>>>> reads as zeros, right?
>>>>>>>>> Hi,
>>>>>>>>> No the memory will be unmapped (we allocate on demand).
>>>>>>>> 
>>>>>>>> Yes, but those unmapped pages will read as zeroes if the guest 
>>>>>>>> accesses it?
>>>>>>> yes.
>>>>>>>> 
>>>>>>>>>> If a VM with a lot of unwritten memory is migrated or if the memory 
>>>>>>>>>> contains a lot
>>>>>>>>>> of zeroed out memory (e.g. Windows or Linux guest with page 
>>>>>>>>>> sanitization) all this memory
>>>>>>>>>> is allocated on the target during live migration. Especially with 
>>>>>>>>>> KSM this leads
>>>>>>>>>> to the problem that this memory is allocated and might be not 
>>>>>>>>>> available completely as
>>>>>>>>>> merging of the pages will happen async.
>>>>>>>>>> 
>>>>>>>>>> Wouldn't it make sense to not send zero pages in the first round 
>>>>>>>>>> where the complete
>>>>>>>>>> ram is sent (if it is detectable that we are in this stage)?
>>>>>>>>> We send one byte per zero page at the moment (see is_dup_page) we can 
>>>>>>>>> further optimizing it
>>>>>>>>> by not sending it.
>>>>>>>>> I have to point out that this is a very idle guest and we need to 
>>>>>>>>> work on a loaded guest 
>>>>>>>>> which is the more hard problem in migration.
>>>>>>>> 
>>>>>>>> I was not talking about saving one byte (+ 8 bytes for header), my 
>>>>>>>> concern was that we memset all (dup) pages
>>>>>>>> including the special case of a zero dup page on the migration target. 
>>>>>>>> This allocates the memory or does it not?
>>>>>>>> 
>>>>>>> 
>>>>>>>> If my above assumption that the guest reads unmapped memory as zeroes 
>>>>>>>> is right, this mapping
>>>>>>>> is not necessary in the case of a zero dup page.
>>>>>>>> 
>>>>>>>> We just have to make sure that we are still in the very first round 
>>>>>>>> when deciding not to sent
>>>>>>>> a zero page, because otherwise it could be a page that has become zero 
>>>>>>>> during migration and
>>>>>>>> this of course has to be transferred.
>>>>>>> 
>>>>>>> OK, so if we won't send the pages than it won't be allocate in the dst 
>>>>>>> and it can improve both 
>>>>>>> memory usage and reduce cpu consumption on it.
>>>>>>> That can be good for over commit scenario.
>>>>>> 
>>>>>> Yes. On the Source host those zero pages have likely all been merged by 
>>>>>> KSM already, but on the destination
>>>>>> they are allocated and initially consume real memory. This can be a 
>>>>>> problem if a lot of incoming migrations happen
>>>>>> at the same time.
>>>>> 
>>>>> That can be very effective.
>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Also I notice that the bottle neck in migrating unmapped pages is the 
>>>>>>>>> detection of those pages
>>>>>>>>> because we map the pages in order to check them, for a large guest 
>>>>>>>>> this is very expensive as mapping a page
>>>>>>>>> results in a page fault in the host.
>>>>>>>>> So what will be very helpful is actually locating those pages without 
>>>>>>>>> mapping them
>>>>>>>>> which looks very complicated.
>>>>>>>> 
>>>>>>>> This would be a nice improvement, but as you said a guest will sooner 
>>>>>>>> or later allocate
>>>>>>>> all memory if it is not totally idle. However, bigger parts of this 
>>>>>>>> memory might have been reset to zeroes.
>>>>>>>> This happens on page deallocation in a Windows Guest by default and 
>>>>>>>> can also be enforced in LInux
>>>>>>>> with page sanitization.
>>>>>>> 
>>>>>>> true, but it those cases we will want to zero the page in the dst as 
>>>>>>> this is done for security reasons.
>>>>>> 
>>>>>> if i migrate it to a destination where initially all memory is unmapped 
>>>>>> not migrating the zero page turns it
>>>>>> into an unmapped page (which reads a zero?). where is the security 
>>>>>> problem? its like rethinning on a storage.
>>>>>> Or do I understand something wrong here? Is the actual mapping 
>>>>>> information migrated?
>>>>> 
>>>>> I was referring to pages that had some data and were migrated, so when 
>>>>> the guest OS zeros them we need to zero them
>>>>> also in destination because the data is also there.
>>>> 
>>>> Ok, so can we with the current implementation effectively decide if a page 
>>>> is transferred for the first time?
>>> 
>>> In the old code (before 1.3 or 1.2  we add a separate function for the 
>>> first full transfer but now we don't.
>>> So I guess you will need to implement it, it shouldn't be too complicated.
>>> I would add a flag to the existing code.
>>>> 
>>>> Do we always migrate the complete memory once and then iterate over dirty 
>>>> pages? I have to check the code
>>>> that searches for dirty pages to confirm that.
>>> We set all the bitmap as dirty in the beginning of migration so in the 
>>> first iteration all pages will be sent.
>>> The code is in arch_init.c, look at ram_save_setup and ram_save_iterate.
>> 
>> I will have a look and sent a RFC patch once I have tested it.
> Great!

diff --git a/arch_init.c b/arch_init.c
index dada6de..33f3b12 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -426,6 +426,8 @@ static void migration_bitmap_sync(void)
  *           0 means no dirty pages
  */
 
+static uint64_t complete_rounds;
+
 static int ram_save_block(QEMUFile *f, bool last_stage)
 {
     RAMBlock *block = last_seen_block;
@@ -451,6 +453,10 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
             if (!block) {
                 block = QTAILQ_FIRST(&ram_list.blocks);
                 complete_round = true;
+                if (!complete_rounds) {
+                    error_report("ram_save_block: finished bulk ram 
migration");
+                }
+                complete_rounds++;
             }
         } else {
             uint8_t *p;
@@ -463,10 +469,17 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
             bytes_sent = -1;
             if (is_dup_page(p)) {
                 acct_info.dup_pages++;
-                bytes_sent = save_block_hdr(f, block, offset, cont,
+                /* we can skip transferring zero pages in the first round 
because
+                   memory is unmapped (reads as zero) at the target anyway or 
initialized
+                   to zero in case of mem-prealloc. */
+                if (complete_rounds || *p) {
+                    bytes_sent = save_block_hdr(f, block, offset, cont,
                                             RAM_SAVE_FLAG_COMPRESS);
-                qemu_put_byte(f, *p);
-                bytes_sent += 1;
+                    qemu_put_byte(f, *p);
+                    bytes_sent += 1;
+                } else {
+                    bytes_sent = 1;
+                }
             } else if (migrate_use_xbzrle()) {
                 current_addr = block->offset + offset;
                 bytes_sent = save_xbzrle_page(f, p, current_addr, block,
@@ -569,6 +582,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
     qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
+    complete_rounds = 0;
     reset_ram_globals();
 
     if (migrate_use_xbzrle()) {





reply via email to

[Prev in Thread] Current Thread [Next in Thread]