qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [POC] colo-proxy in qemu


From: zhanghailiang
Subject: Re: [Qemu-devel] [POC] colo-proxy in qemu
Date: Thu, 30 Jul 2015 20:42:28 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
* Gonglei (address@hidden) wrote:
On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
* Jason Wang (address@hidden) wrote:


On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
* Dong, Eddie (address@hidden) wrote:
A question here, the packet comparing may be very tricky. For example,
some protocol use random data to generate unpredictable id or
something else. One example is ipv6_select_ident() in Linux. So COLO
needs a mechanism to make sure PVM and SVM can generate same random
data?
Good question, the random data connection is a big problem for COLO. At
present, it will trigger checkpoint processing because of the different random
data.
I don't think any mechanisms can assure two different machines generate the
same random data. If you have any ideas, pls tell us :)

Frequent checkpoint can handle this scenario, but maybe will cause the
performance poor. :(

The assumption is that, after VM checkpoint, SVM and PVM have identical 
internal state, so the pattern used to generate random data has high 
possibility to generate identical data at short time, at least...
They do diverge pretty quickly though; I have simple examples which
reliably cause a checkpoint because of simple randomness in applications.

Dave


And it will become even worse if hwrng is used in guest.

Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
once established, tends to work well without triggering checkpoints;
and static web pages also work well.  Examples of things that do cause
more checkpoints are, displaying guest statistics (e.g. running top
in that ssh) which is timing dependent, and dynamically generated
web pages that include a unique ID (bugzilla's password reset link in
it's front page was a fun one), I think also establishing
new encrypted connections cause the same randomness.

However, it's worth remembering that COLO is trying to reduce the
number of checkpoints compared to a simple checkpointing world
which would be aiming to do a checkpoint ~100 times a second,
and for compute bound workloads, or ones that don't expose
the randomness that much, it can get checkpoints of a few seconds
in length which greatly reduces the overhead.


Yes. That's the truth.
We can set two different modes for different scenarios. Maybe Named
1) frequent checkpoint mode for multi-connections and randomness scenarios
and 2) non-frequent checkpoint mode for other scenarios.

But that's the next plan, we are thinking about that.

I have some code that tries to automatically switch between those;
it measures the checkpoint lengths, and if they're consistently short
it sends a different message byte to the secondary at the start of the
checkpoint, so that it doesn't bother running.   Every so often it
then flips back to a COLO checkpoint to see if the checkpoints
are still really fast.


Do you mean if there are consistent checkpoint requests, not do checkpoint but 
just send a special message to SVM?
Resume to common COLO mode until the checkpoint lengths is so not short ?

Thanks.

Dave


Regards,
-Gonglei

--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]