qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Testing migration under stress


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] Testing migration under stress
Date: Tue, 06 Nov 2012 18:55:34 +1100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121010 Thunderbird/16.0.1

On 06/11/12 17:55, David Gibson wrote:
On Tue, Nov 06, 2012 at 04:22:11PM +1100, Alexey Kardashevskiy wrote:
On 02/11/12 23:12, Orit Wasserman wrote:
On 11/02/2012 05:10 AM, David Gibson wrote:
Asking for some advice on the list.

I have prorotype savevm and migration support ready for the pseries
machine.  They seem to work under simple circumstances (idle guest).
To test them more extensively I've been attempting to perform live
migrations (just over tcp->localhost) which the guest is active with
something.  In particular I've tried while using octave to do matrix
multiply (so exercising the FP unit) and my colleague Alexey has tried
during some video encoding.

As you are doing local migration one option is to setting the speed higher
than line speed , as we don't actually send the data, another is to set high 
downtime.

However, in each of these cases, we've found that the migration only
completes and the source instance only stops after the intensive
workload has (just) completed.  What I surmise is happening is that
the workload is touching memory pages fast enough that the ram
migration code is never getting below the threshold to complete the
migration until the guest is idle again.

The workload you chose is really bad for live migration, as all the guest does 
is
dirtying his memory. I recommend looking for workload that does some networking 
or disk IO.
Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I 
don't
know if they run on pseries, but similar workload should be ok(small 
database/warehouse).
We found out that SpecJbb on the other hand is hard to converge.
Web workload or video streaming also do the trick.


My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36
pixels. So it should not be dirtying memory too much. Or is it?

Oh.. if your encoding the same format to the same format it may well
be optimized and therefore memory limited.

No, it is not optimized, it still decodes and encodes as I inserted some filter in a chain.

I was envisaging encoding
an uncompressed format to a highly compressed format, which should be
compute limited rather than memory bandwidth limited.

The size and
resolution of the input doesn't really matter as long as:
           * the output size is much smaller than the input size

This is another scenario, I run both. I just tried to reduce memory consumption as it was recommended here and see if anything changes.

Originally it was 1280*720 to 64*36 but I am not sure it does not use much memory as (I suspect at least sometime) ffmpeg may decode a series of full size frames to do motion detection or something.


and        * it takes several minutes for the full encode to give a
             reasonable amount of  time for the migrate to converge.

90 seconds each file, if I run a script which does encoding in a loop, the pause between encodings is not big enough to finish migration anyway if I encode big video to small video.

However if it is 64*36, migration finishes (the first qemu succeeds and stops) but the new guest does not resume.



(qemu) info migrate
capabilities: xbzrle: off
Migration status: completed
total time: 14538 milliseconds
downtime: 1273 milliseconds
transferred ram: 389961 kbytes
remaining ram: 0 kbytes
total ram: 1065024 kbytes
duplicate: 181949 pages
normal: 97446 pages
normal bytes: 389784 kbytes

How many bytes were actually transferred? "duplicate" * 4K = 745MB?

Is there any tool in QEMU to see how many pages are used/dirty/etc?
"info" does not seem to have any kind of such statistic.

btw the new guest did not resume (qemu still responds on commands)
but this is probably our problem within "pseries" platform. What is

Uh, that's a bug, and I'm not sure when it broke.  If the migrate
isn't even working we're premature in attempting to work out why it
isn't happening when we expect.

Here I wanted to emphasize that I would like to find some way to get information about how migration is doing (or done) in the new guest - there is no statistic about it.


--
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]