Re: [Qemu-devel] Abnormal observation during migration: too many "write-

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Abnormal observation during migration: too many "write-

From:	Juan Quintela
Subject:	Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages
Date:	Wed, 15 Nov 2017 10:45:44 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)

"Chunguang Li" <address@hidden> wrote:
> Hi all! 

Hi

Sorry for the delay, I was on vacation an still getting up to speed.

> I got a very abnormal observation for the VM migration. I found that many 
> pages marked as dirty during
> migration are "not really dirty", which is, their content are the same as the 
> old version. 

I think your test is quite good, and I am also ashamed that 80% of
"false" dirty pages is really a lot.

> I did the migration experiment like this: 
>
> During the setup phase of migration, first I suspended the VM. Then I copied 
> all the pages within the guest
> physical address space to a memory buffer as large as the guest memory size. 
> After that, the dirty tracking
> began and I resumed the VM. Besides, at the end
> of each iteration, I also suspended the VM temporarily. During the 
> suspension, I compared the content of all
> the pages marked as dirty in this iteration byte-by-byte with their former 
> copies inside the buffer. If the
> content of one page was the same as its former copy, I recorded it as a 
> "write-not-dirty" page (the page is
> written exactly with the same content as the old version). Otherwise, I 
> replaced this page in the buffer with
> the new content, for the possible comparison in the future. After the reset 
> of the dirty bitmap, I resumed the
> VM. Thus, I obtain the proportion of the write-not-dirty pages within all the 
> pages marked as dirty for each
> pre-copy iteration. 


vhost and friends could make a small difference here, but in general,
this approach should be ok.

> I repeated this experiment with 15 workloads, which are 11 CPU2006 
> benchmarks, Memcached server,
> kernel compilation, playing a video, and an idle VM. The CPU2006 benchmarks 
> and Memcached are
> write-intensive workloads. So almost all of them did not converge to 
> stop-copy. 

That is the impressive part, 15 workloads.  Thanks for taking the effor.

BTW, do you have your qemu changes handy, just to be able to test
locally, and "review" how do you measure things.


> Startlingly, the proportions of the write-not-dirty pages are quite high. 
> Memcached and three CPU2006
> benchmarks(zeusmp, mcf and bzip2) have the most high proportions. Their 
> proportions of the write-not-dirty
> pages within all the dirty pages are as high as 45%-80%.

Or the workload does really stupid things like:

a = 0;
a = 1;
a = 0;

This makes no sense at all.

Just in case, could you try to test this with xbzrle?  It should go well
with this use case (but you need to get a big enough buffer to cache
enough memory).


> The proportions of the other workloads are about
> 5%-20%, which are also abnormal. According to my intuition, the proportion of 
> write-not-dirty pages should be
> far less than these numbers. I think it should be quite a particular case 
> that one page is written with exactly
> the same content as the former data. 

I agree with that.

> Besides, the zero pages are not counted for all the results. Because I think 
> codes like memset() may write
> large area of pages to zero pages, which are already zero pages before. 
>
> I excluded some possible unknown reasons with the machine hardware, because I 
> repeated the experiments
> with two sets of different machines. Then I guessed it might be related with 
> the huge page feature. However,
> the result was the same when I turned the huge page feature off in the OS. 

Huge page could have caused that.  Remember that we have transparent
huge pages.  I have to look at that code.

> Now there are only two possible reasons in my opinion. 
>
> First, there is some bugs in the KVM kernel dirty tracking mechanism. It may 
> mark some pages that do not
> receive write request as dirty. 

That is a posibilty.

> Second, there is some bugs in the OS running inside the VM. It may issue some 
> unnecessary write
> requests. 
>
> What do you think about this abnormal phenomenon? Any advice or possible 
> reasons or even guesses? I
> appreciate any responses, because it has confused me for a long time. Thank 
> you.

I would like to reproduce this.

Thanks for bringing this to our attention.

Later, Juan.

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages, Chunguang Li, 2017/11/12
- Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages, Juan Quintela <=
  - Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages, Chunguang Li, 2017/11/15
- Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages, Dr. David Alan Gilbert, 2017/11/15
  - Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages, Chunguang Li, 2017/11/15
    - Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages, Dr. David Alan Gilbert, 2017/11/15
    - Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages, Chunguang Li, 2017/11/15
- Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages, Chunguang Li, 2017/11/15

Prev by Date: Re: [Qemu-devel] [RFC v3 00/27] QMP: out-of-band (OOB) execution support
Next by Date: Re: [Qemu-devel] [RFC v3 01/27] char-io: fix possible race on IOWatchPoll
Previous by thread: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages
Next by thread: Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages
Index(es):
- Date
- Thread