qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Migration speed throttling, max_throttle in migration.c


From: Thomas Treutner
Subject: [Qemu-devel] Migration speed throttling, max_throttle in migration.c
Date: Wed, 09 Feb 2011 19:13:49 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7

Hi,

I was reading qemu's (qemu-kvm-0.13.0's, to be specific) live migration code to unterstand how the iterative dirty page transfer is implemented. During this I noticed that ram_save_live in arch_init.c is called quite often, more often than I expected (approx. 200 times for an idle 500MiB VM). I found out that this is because of while (!qemu_file_rate_limit(f)), which evaluates very often to true, and as there are remaining dirty pages, ram_save_live is called again.

As I had set no bandwith limit in the libvirt call, I digged deeper and found a hard coded maximum bandwidth in migration.c:

/* Migration speed throttling */
static uint32_t max_throttle = (32 << 20);

Using a packet sniffer I verified that max_throttle is Byte/s, here of course 32 MiB/s. Additionally, it translates directly to network bandwidth - I was not sure about that, as the bandwidth measured in ram_save_live seems to be buffer/memory subsystem bandwidth?

Anyways, I'm wondering why exactly *this* value was chosen as a hard coded limit? 32MiB/s are ~ 250Mbit/s, which is *both* much more than 100Mbit/s Ethernet and much less than Gbit-Ethernet can cope with? So in the first case, TCP congestion control should take over control anyways, and in the second, 3/4 of the bandwidth is thrown away.

As I'm using Gbit-Ethernet, I experimented with different values. With max_throttle = (112 << 20); - which is ~ 900Mbit/s - my Gbit network is nicely saturated, and live migrations of a rather idle 700MiB VM take ~5s instead of ~15s without any problems, which is very nice. Much more important is the fact that VMs with higher memory activity and therefore higher rates of page dirtying are transferred more easily without additional manual action, as the default maxdowntime is 30ms, which is often unreachable in such situations and there is no evasive action built in, like a maximum number of iterations and a forced last iteration or aborting the migration when this limit is reached.

So, I'm asking if there is a good reason why *not* to change max_throttle to a value that targets at saturating a Gbit network, if 100Mbit networks will be "flooded" anyways by the current setting?


thanks & regards,
-t



reply via email to

[Prev in Thread] Current Thread [Next in Thread]