qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large mem


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps
Date: Mon, 8 Aug 2011 15:02:31 +0100

On Mon, Aug 8, 2011 at 9:42 AM, Shribman, Aidan <address@hidden> wrote:
>> -----Original Message-----
>> From: Stefan Hajnoczi [mailto:address@hidden
>> Sent: Tuesday, August 02, 2011 9:06 PM
>> To: Shribman, Aidan
>> Cc: address@hidden; Anthony Liguori
>> Subject: Re: [PATCH v3] XBZRLE delta for live migration of
>> large memory apps
>>
>> On Tue, Aug 02, 2011 at 03:45:56PM +0200, Shribman, Aidan wrote:
>> > Subject: [PATCH v3] XBZRLE delta for live migration of
>> large memory apps
>> > From: Aidan Shribman <address@hidden>
>> >
>> > By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we
>> can reduce VM downtime
>> > and total live-migration time for VMs running memory write
>> intensive workloads
>> > typical of large enterprise applications such as SAP ERP
>> Systems, and generally
>> > speaking for representative of any application with a
>> sparse memory update pattern.
>> >
>> > On the sender side XBZRLE is used as a compact delta
>> encoding of page updates,
>> > retrieving the old page content from an LRU cache (default
>> size of 64 MB). The
>> > receiving side uses the existing page content and XBZRLE to
>> decode the new page
>> > content.
>> >
>> > Work was originally based on research results published VEE
>> 2011: Evaluation of
>> > Delta Compression Techniques for Efficient Live Migration
>> of Large Virtual
>> > Machines by Benoit, Svard, Tordsson and Elmroth.
>> Additionally the delta encoder
>> > XBRLE was improved further using XBZRLE instead.
>> >
>> > XBZRLE has a sustained bandwidth of 1.5-2.2 GB/s for
>> typical workloads making it
>> > ideal for in-line, real-time encoding such as is needed for
>> live-migration.
>>
>> What is the CPU cost of xbzrle live migration on the source host?  I'm
>> thinking about a graph showing CPU utilization (e.g. from mpstat(1))
>> that has two datasets: migration without xbzrle and migration with
>> xbzrle.
>>
>
> zbzrle.out indicates that xbzrle is using 50% of the compute capacity during 
> the xbzrle live-migration (which completed is  few seconds), In vanilla.out 
> between 30%-60% of compute is directed toward the live-migration itself - in 
> this case live-migration is not able to complete.
>
> -----
>
> address@hidden:~#
> address@hidden:~# cat xbzrle.out
> Linux 2.6.35-22-server (ilrsh01)        08/07/2011      _x86_64_        (2 
> CPU)
>
> 10:55:37 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  
> %guest    %idle
> 10:55:38 AM  all   40.50    0.00    1.00    1.50    0.00    9.00    0.00    
> 0.00   48.00
> 10:55:38 AM    0    0.00    0.00    1.00    3.00    0.00    0.00    0.00    
> 0.00   96.00
> 10:55:38 AM    1   81.00    0.00    1.00    0.00    0.00   18.00    0.00    
> 0.00    0.00

Too bad mpstat %guest is not being displayed correctly here.  That
would make it much easier to see how much CPU is spent executing guest
code and how much doing live migration.  Old system?

>> > +int xbzrle_encode(uint8_t *xbzrle, const uint8_t *old,
>> const uint8_t *curr,
>> > +    const size_t max_compressed_len)
>> > +{
>> > +    int compressed_len;
>> > +
>> > +    xor_encode_word(xor_buf, old, curr);
>> > +    compressed_len = rle_encode((uint64_t *)xor_buf,
>> > +        sizeof(xor_buf)/sizeof(uint64_t), xbzrle_buf,
>> > +        sizeof(xbzrle_buf));
>> > +    if (compressed_len > max_compressed_len) {
>> > +        return -1;
>> > +    }
>> > +    memcpy(xbzrle, xbzrle_buf, compressed_len);
>>
>> Why the intermediate xbrzle_buf buffer and why the memcpy()?
>
> xbzrle encoding may take up to 150% in a rare worst case scenario - to avoid 
> having to check during each xbzrle iteration or alternatively adding a loop 
> that checks for overflow potential during the xbzrle encoding I use the 
> xbzrle_buf as working area. memcpy() is a factor faster than  xbzrle so it's 
> slow-down is in-significant.

I missed that the encode/decode functions do not check their dlen
parameter.  dlen is unused and should be removed.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]