Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large mem

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large mem

From:	Shribman, Aidan
Subject:	Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps
Date:	Mon, 8 Aug 2011 10:42:40 +0200
> -----Original Message-----
> From: Blue Swirl [mailto:address@hidden
> Sent: Tuesday, August 02, 2011 11:44 PM
> To: Shribman, Aidan
> Cc: Stefan Hajnoczi; address@hidden
> Subject: Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live
> migration of large memory apps
>
> On Tue, Aug 2, 2011 at 1:45 PM, Shribman, Aidan
> <address@hidden> wrote:
> > Subject: [PATCH v3] XBZRLE delta for live migration of
> large memory apps
> > From: Aidan Shribman <address@hidden>
> >
> > By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we
> can reduce VM downtime
> > and total live-migration time for VMs running memory write
> intensive workloads
> > typical of large enterprise applications such as SAP ERP
> Systems, and generally
> > speaking for representative of any application with a
> sparse memory update pattern.
> >
> > On the sender side XBZRLE is used as a compact delta
> encoding of page updates,
> > retrieving the old page content from an LRU cache (default
> size of 64 MB). The
> > receiving side uses the existing page content and XBZRLE to
> decode the new page
> > content.
> >
> > Work was originally based on research results published VEE
> 2011: Evaluation of
> > Delta Compression Techniques for Efficient Live Migration
> of Large Virtual
> > Machines by Benoit, Svard, Tordsson and Elmroth.
> Additionally the delta encoder
> > XBRLE was improved further using XBZRLE instead.
> >
> > XBZRLE has a sustained bandwidth of 1.5-2.2 GB/s for
> typical workloads making it
> > ideal for in-line, real-time encoding such as is needed for
> live-migration.
> >
> > A typical usage scenario:
> >    {qemu} migrate_set_cachesize 256m
> >    {qemu} migrate -x -d tcp:destination.host:4444
> >    {qemu} info migrate
> >    ...
> >    transferred ram-duplicate: A kbytes
> >    transferred ram-duplicate: B pages
> >    transferred ram-normal: C kbytes
> >    transferred ram-normal: D pages
> >    transferred ram-xbrle: E kbytes
> >    transferred ram-xbrle: F pages
> >    overflow ram-xbrle: G pages
> >    cache-hit ram-xbrle: H pages
> >    cache-lookup ram-xbrle: J pages
> >
> > Testing: live migration with XBZRLE completed in 110
> seconds, without live
> > migration was not able to complete.
> >
> > A simple synthetic memory r/w load generator:
> > ..    include <stdlib.h>
> > ..    include <stdio.h>
> > ..    int main()
> > ..    {
> > ..        char *buf = (char *) calloc(4096, 4096);
> > ..        while (1) {
> > ..            int i;
> > ..            for (i = 0; i < 4096 * 4; i++) {
> > ..                buf[i * 4096 / 4]++;
> > ..            }
> > ..            printf(".");
> > ..        }
> > ..    }
> >
> > Signed-off-by: Benoit Hudzia <address@hidden>
> > Signed-off-by: Petter Svard <address@hidden>
> > Signed-off-by: Aidan Shribman <address@hidden>
> >
> > --
> >
> >  Makefile.target   |    1 +
> >  arch_init.c       |  331
> ++++++++++++++++++++++++++++++++++++++++++++++-------
> >  block-migration.c |    3 +-
> >  hash.h            |   72 ++++++++++++
> >  hmp-commands.hx   |   36 ++++--
> >  hw/hw.h           |    3 +-
> >  lru.c             |  151 ++++++++++++++++++++++++
> >  lru.h             |   13 ++
> >  migration-exec.c  |    6 +-
> >  migration-fd.c    |    6 +-
> >  migration-tcp.c   |    6 +-
> >  migration-unix.c  |    6 +-
> >  migration.c       |  119 ++++++++++++++++++-
> >  migration.h       |   25 ++++-
> >  qmp-commands.hx   |   43 ++++++-
> >  savevm.c          |   13 ++-
> >  sysemu.h          |   13 ++-
> >  xbzrle.c          |  125 ++++++++++++++++++++
> >  xbzrle.h          |   12 ++
> >  19 files changed, 905 insertions(+), 79 deletions(-)
> >
> > diff --git a/Makefile.target b/Makefile.target
> > index 2800f47..b3215de 100644
> > --- a/Makefile.target
> > +++ b/Makefile.target
> > @@ -186,6 +186,7 @@ endif #CONFIG_BSD_USER
> >  ifdef CONFIG_SOFTMMU
> >
> >  obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o
> > +obj-y += lru.o xbzrle.o
> >  # virtio has to be here due to weird dependency between
> PCI and virtio-net.
> >  # need to fix this properly
> >  obj-y += virtio-blk.o virtio-balloon.o virtio-net.o
> virtio-serial-bus.o
> > diff --git a/arch_init.c b/arch_init.c
> > old mode 100644
> > new mode 100755
> > index 4486925..5d18652
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -27,6 +27,7 @@
> >  #include <sys/types.h>
> >  #include <sys/mman.h>
> >  #endif
> > +#include <assert.h>
>
> Is this needed?

removed (see patch v4)

>
> >  #include "config.h"
> >  #include "monitor.h"
> >  #include "sysemu.h"
> > @@ -40,6 +41,17 @@
> >  #include "net.h"
> >  #include "gdbstub.h"
> >  #include "hw/smbios.h"
> > +#include "lru.h"
> > +#include "xbzrle.h"
> > +
> > +//#define DEBUG_ARCH_INIT
> > +#ifdef DEBUG_ARCH_INIT
> > +#define DPRINTF(fmt, ...) \
> > +    do { fprintf(stdout, "arch_init: " fmt, ##
> __VA_ARGS__); } while (0)
> > +#else
> > +#define DPRINTF(fmt, ...) \
> > +    do { } while (0)
> > +#endif
> >
> >  #ifdef TARGET_SPARC
> >  int graphic_width = 1024;
> > @@ -88,6 +100,153 @@ const uint32_t arch_type = QEMU_ARCH;
> >  #define RAM_SAVE_FLAG_PAGE     0x08
> >  #define RAM_SAVE_FLAG_EOS      0x10
> >  #define RAM_SAVE_FLAG_CONTINUE 0x20
> > +#define RAM_SAVE_FLAG_XBZRLE    0x40
> > +
> > +/***********************************************************/
> > +/* RAM Migration State */
> > +typedef struct ArchMigrationState {
> > +    int use_xbrle;
> > +    int64_t xbrle_cache_size;
> > +} ArchMigrationState;
> > +
> > +static ArchMigrationState arch_mig_state;
> > +
> > +void arch_set_params(int blk_enable, int shared_base, int
> use_xbrle,
> > +        int64_t xbrle_cache_size, void *opaque)
> > +{
> > +    arch_mig_state.use_xbrle = use_xbrle;
> > +    arch_mig_state.xbrle_cache_size = xbrle_cache_size;
> > +}
> > +
> > +/***********************************************************/
> > +/* XBZRLE (Xor Binary Zero Run-Length Encoding) */
> > +typedef struct XBZRLEHeader {
> > +    uint8_t xh_flags;
> > +    uint16_t xh_len;
> > +    uint32_t xh_cksum;
> > +} XBZRLEHeader;
>
> This order of fields maximizes padding. Please reverse the order.
>

reversed order (see patch v4)

> > +
> > +static uint8_t dup_buf[TARGET_PAGE_SIZE];
> > +
> > +/***********************************************************/
> > +/* accounting */
> > +typedef struct AccountingInfo{
> > +    uint64_t dup_pages;
> > +    uint64_t norm_pages;
> > +    uint64_t xbrle_bytes;
> > +    uint64_t xbrle_pages;
> > +    uint64_t xbrle_overflow;
> > +    uint64_t xbrle_cache_lookup;
> > +    uint64_t xbrle_cache_hit;
> > +    uint64_t iterations;
> > +} AccountingInfo;
> > +
> > +static AccountingInfo acct_info;
> > +
> > +static void acct_clear(void)
> > +{
> > +    bzero(&acct_info, sizeof(acct_info));
>
> memset()

replaced with memset() for better portability (see patch V4)
>
> > +}
> > +
> > +uint64_t dup_mig_bytes_transferred(void)
> > +{
> > +    return acct_info.dup_pages;
> > +}
> > +
> > +uint64_t dup_mig_pages_transferred(void)
> > +{
> > +    return acct_info.dup_pages;
> > +}
> > +
> > +uint64_t norm_mig_bytes_transferred(void)
> > +{
> > +    return acct_info.norm_pages * TARGET_PAGE_SIZE;
> > +}
> > +
> > +uint64_t norm_mig_pages_transferred(void)
> > +{
> > +    return acct_info.norm_pages;
> > +}
> > +
> > +uint64_t xbrle_mig_bytes_transferred(void)
> > +{
> > +    return acct_info.xbrle_bytes;
> > +}
> > +
> > +uint64_t xbrle_mig_pages_transferred(void)
> > +{
> > +    return acct_info.xbrle_pages;
> > +}
> > +
> > +uint64_t xbrle_mig_pages_overflow(void)
> > +{
> > +    return acct_info.xbrle_overflow;
> > +}
> > +
> > +uint64_t xbrle_mig_pages_cache_hit(void)
> > +{
> > +    return acct_info.xbrle_cache_hit;
> > +}
> > +
> > +uint64_t xbrle_mig_pages_cache_lookup(void)
> > +{
> > +    return acct_info.xbrle_cache_lookup;
> > +}
> > +
> > +static void save_block_hdr(QEMUFile *f, RAMBlock *block,
> ram_addr_t offset,
> > +        int cont, int flag)
> > +{
> > +        qemu_put_be64(f, offset | cont | flag);
> > +        if (!cont) {
> > +                qemu_put_byte(f, strlen(block->idstr));
> > +                qemu_put_buffer(f, (uint8_t *)block->idstr,
> > +                                strlen(block->idstr));
>
> It's better to just write always sizeof(block->idstr) bytes.

sizeof(block->idstr) is 256 so I prefer to leave as strlen(block->idstr) which 
should work (additionaly this is consistant with other parts of the code).

>
> > +        }
> > +}
> > +
> > +#define ENCODING_FLAG_XBZRLE 0x1
> > +
> > +static int save_xbrle_page(QEMUFile *f, uint8_t *current_page,
> > +        ram_addr_t current_addr, RAMBlock *block,
> ram_addr_t offset, int cont)
> > +{
> > +    int encoded_len = 0, bytes_sent = 0;
> > +    XBZRLEHeader hdr = {0};
> > +    uint8_t *encoded, *old_page;
> > +
> > +    /* abort if page not cached */
> > +    acct_info.xbrle_cache_lookup++;
> > +    old_page = lru_lookup(current_addr);
> > +    if (!old_page) {
> > +        goto done;
> > +    }
> > +    acct_info.xbrle_cache_hit++;
> > +
> > +    /* XBZRLE (XOR+RLE) encoding */
> > +    encoded = (uint8_t *) qemu_malloc(TARGET_PAGE_SIZE);
> > +    encoded_len = xbzrle_encode(encoded, old_page, current_page,
> > +            TARGET_PAGE_SIZE);
> > +
> > +    if (encoded_len < 0) {
> > +        DPRINTF("XBZRLE encoding overflow - sending
> uncompressed\n");
> > +        acct_info.xbrle_overflow++;
> > +        goto done;
> > +    }
> > +
> > +    hdr.xh_len = encoded_len;
> > +    hdr.xh_flags |= ENCODING_FLAG_XBZRLE;
> > +
> > +    /* Send XBZRLE compressed page */
> > +    save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBZRLE);
> > +    qemu_put_buffer(f, (uint8_t *) &hdr, sizeof(hdr));
>
> This fails when the host endianness does not match. Please save each
> field separately. Even better, switch to VMState.

switched to qemu_put_byte/qemu_put_be16/qemu_put_be32 (see patch v4)
>
> > +    qemu_put_buffer(f, encoded, encoded_len);
> > +    acct_info.xbrle_pages++;
> > +    bytes_sent = encoded_len + sizeof(hdr);
> > +    acct_info.xbrle_bytes += bytes_sent;
> > +
> > +done:
> > +    qemu_free(encoded);
> > +    return bytes_sent;
> > +}
> >
> >  static int is_dup_page(uint8_t *page, uint8_t ch)
> >  {
> > @@ -107,7 +266,7 @@ static int is_dup_page(uint8_t *page,
> uint8_t ch)
> >  static RAMBlock *last_block;
> >  static ram_addr_t last_offset;
> >
> > -static int ram_save_block(QEMUFile *f)
> > +static int ram_save_block(QEMUFile *f, int stage)
> >  {
> >     RAMBlock *block = last_block;
> >     ram_addr_t offset = last_offset;
> > @@ -120,6 +279,7 @@ static int ram_save_block(QEMUFile *f)
> >     current_addr = block->offset + offset;
> >
> >     do {
> > +        lru_free_cb_t free_cb = qemu_free;
> >         if (cpu_physical_memory_get_dirty(current_addr,
> MIGRATION_DIRTY_FLAG)) {
> >             uint8_t *p;
> >             int cont = (block == last_block) ?
> RAM_SAVE_FLAG_CONTINUE : 0;
> > @@ -128,28 +288,35 @@ static int ram_save_block(QEMUFile *f)
> >                                             current_addr +
> TARGET_PAGE_SIZE,
> >                                             MIGRATION_DIRTY_FLAG);
> >
> > -            p = block->host + offset;
> > +            if (arch_mig_state.use_xbrle) {
> > +                p = qemu_mallocz(TARGET_PAGE_SIZE);
> > +                memcpy(p, block->host + offset, TARGET_PAGE_SIZE);
> > +            } else {
> > +                p = block->host + offset;
> > +            }
> >
> >             if (is_dup_page(p, *p)) {
> > -                qemu_put_be64(f, offset | cont |
> RAM_SAVE_FLAG_COMPRESS);
> > -                if (!cont) {
> > -                    qemu_put_byte(f, strlen(block->idstr));
> > -                    qemu_put_buffer(f, (uint8_t *)block->idstr,
> > -                                    strlen(block->idstr));
> > -                }
> > +                save_block_hdr(f, block, offset, cont,
> RAM_SAVE_FLAG_COMPRESS);
> >                 qemu_put_byte(f, *p);
> >                 bytes_sent = 1;
> > -            } else {
> > -                qemu_put_be64(f, offset | cont |
> RAM_SAVE_FLAG_PAGE);
> > -                if (!cont) {
> > -                    qemu_put_byte(f, strlen(block->idstr));
> > -                    qemu_put_buffer(f, (uint8_t *)block->idstr,
> > -                                    strlen(block->idstr));
> > +                acct_info.dup_pages++;
> > +                if (arch_mig_state.use_xbrle && !*p) {
>
> Why !*p instead of !p?

I want to check for the common case in which p is an array of zeros as I want 
to optimize this case by referancing a static zero array (dup_buf) rather than 
having to allocate and initialize a page for inserting into the LRU cache.

>
> > +                    p = dup_buf;
> > +                    free_cb = NULL;
> >                 }
> > +            } else if (stage == 2 && arch_mig_state.use_xbrle) {
> > +                bytes_sent = save_xbrle_page(f, p,
> current_addr, block,
> > +                    offset, cont);
> > +            }
> > +            if (!bytes_sent) {
> > +                save_block_hdr(f, block, offset, cont,
> RAM_SAVE_FLAG_PAGE);
> >                 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> >                 bytes_sent = TARGET_PAGE_SIZE;
> > +                acct_info.norm_pages++;
> > +            }
> > +            if (arch_mig_state.use_xbrle) {
> > +                lru_insert(current_addr, p, free_cb);
> >             }
> > -
> >             break;
> >         }
> >
> > @@ -221,6 +388,9 @@ int ram_save_live(Monitor *mon,
> QEMUFile *f, int stage, void *opaque)
> >
> >     if (stage < 0) {
> >         cpu_physical_memory_set_dirty_tracking(0);
> > +        if (arch_mig_state.use_xbrle) {
> > +            lru_fini();
> > +        }
> >         return 0;
> >     }
> >
> > @@ -235,6 +405,11 @@ int ram_save_live(Monitor *mon,
> QEMUFile *f, int stage, void *opaque)
> >         last_block = NULL;
> >         last_offset = 0;
> >
> > +        if (arch_mig_state.use_xbrle) {
> > +
> lru_init(arch_mig_state.xbrle_cache_size/TARGET_PAGE_SIZE, 0);
> > +            acct_clear();
> > +        }
> > +
> >         /* Make sure all dirty bits are set */
> >         QLIST_FOREACH(block, &ram_list.blocks, next) {
> >             for (addr = block->offset; addr < block->offset
> + block->length;
> > @@ -264,8 +439,9 @@ int ram_save_live(Monitor *mon,
> QEMUFile *f, int stage, void *opaque)
> >     while (!qemu_file_rate_limit(f)) {
> >         int bytes_sent;
> >
> > -        bytes_sent = ram_save_block(f);
> > +        bytes_sent = ram_save_block(f, stage);
> >         bytes_transferred += bytes_sent;
> > +        acct_info.iterations++;
> >         if (bytes_sent == 0) { /* no more blocks */
> >             break;
> >         }
> > @@ -285,19 +461,66 @@ int ram_save_live(Monitor *mon,
> QEMUFile *f, int stage, void *opaque)
> >         int bytes_sent;
> >
> >         /* flush all remaining blocks regardless of rate limiting */
> > -        while ((bytes_sent = ram_save_block(f)) != 0) {
> > +        while ((bytes_sent = ram_save_block(f, stage))) {
> >             bytes_transferred += bytes_sent;
> >         }
> >         cpu_physical_memory_set_dirty_tracking(0);
> > +        if (arch_mig_state.use_xbrle) {
> > +            lru_fini();
> > +        }
> >     }
> >
> >     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> >
> >     expected_time = ram_save_remaining() * TARGET_PAGE_SIZE
> / bwidth;
> >
> > +    DPRINTF("ram_save_live: expected(%ld) <= max(%ld)?\n",
> expected_time,
> > +        migrate_max_downtime());
> > +
> >     return (stage == 2) && (expected_time <=
> migrate_max_downtime());
> >  }
> >
> > +static int load_xbrle(QEMUFile *f, ram_addr_t addr, void *host)
> > +{
> > +    int len, rc = -1;
> > +    uint8_t *encoded;
> > +    XBZRLEHeader hdr = {0};
> > +
> > +    /* extract RLE header */
> > +    qemu_get_buffer(f, (uint8_t *) &hdr, sizeof(hdr));
> > +    if (!(hdr.xh_flags & ENCODING_FLAG_XBZRLE)) {
> > +        fprintf(stderr, "Failed to load XZBRLE page -
> wrong compression!\n");
> > +        goto done;
> > +    }
> > +
> > +    if (hdr.xh_len > TARGET_PAGE_SIZE) {
> > +        fprintf(stderr, "Failed to load XZBRLE page - len
> overflow!\n");
> > +        goto done;
> > +    }
> > +
> > +    /* load data and decode */
> > +    encoded = (uint8_t *) qemu_malloc(hdr.xh_len);
> > +    qemu_get_buffer(f, encoded, hdr.xh_len);
> > +
> > +    /* decode RLE */
> > +    len = xbzrle_decode(host, host, encoded, hdr.xh_len);
> > +    if (len == -1) {
> > +        fprintf(stderr, "Failed to load XBZRLE page -
> decode error!\n");
> > +        goto done;
> > +    }
> > +
> > +    if (len != TARGET_PAGE_SIZE) {
> > +        fprintf(stderr, "Failed to load XBZRLE page - size
> %d expected %d!\n",
> > +            len, TARGET_PAGE_SIZE);
> > +        goto done;
> > +    }
> > +
> > +    rc = 0;
> > +done:
> > +    qemu_free(encoded);
> > +    return rc;
> > +}
> > +
> >  static inline void *host_from_stream_offset(QEMUFile *f,
> >                                             ram_addr_t offset,
> >                                             int flags)
> > @@ -328,16 +551,38 @@ static inline void
> *host_from_stream_offset(QEMUFile *f,
> >     return NULL;
> >  }
> >
> > +static inline void *host_from_stream_offset_versioned(int
> version_id,
> > +        QEMUFile *f, ram_addr_t offset, int flags)
> > +{
> > +        void *host;
> > +        if (version_id == 3) {
> > +                host = qemu_get_ram_ptr(offset);
> > +        } else {
> > +                host = host_from_stream_offset(f, offset, flags);
> > +        }
> > +        if (!host) {
> > +            fprintf(stderr, "Failed to convert RAM address to host"
> > +                    " for offset 0x%lX!\n", offset);
> > +            abort();
> > +        }
> > +        return host;
> > +}
> > +
> >  int ram_load(QEMUFile *f, void *opaque, int version_id)
> >  {
> >     ram_addr_t addr;
> > -    int flags;
> > +    int flags, ret = 0;
> > +    static uint64_t seq_iter;
> > +
> > +    seq_iter++;
> >
> >     if (version_id < 3 || version_id > 4) {
> > -        return -EINVAL;
> > +        ret = -EINVAL;
> > +        goto done;
> >     }
> >
> >     do {
> > +        void *host;
> >         addr = qemu_get_be64(f);
> >
> >         flags = addr & ~TARGET_PAGE_MASK;
> > @@ -346,7 +591,8 @@ int ram_load(QEMUFile *f, void *opaque,
> int version_id)
> >         if (flags & RAM_SAVE_FLAG_MEM_SIZE) {
> >             if (version_id == 3) {
> >                 if (addr != ram_bytes_total()) {
> > -                    return -EINVAL;
> > +                    ret = -EINVAL;
> > +                    goto done;
> >                 }
> >             } else {
> >                 /* Synchronize RAM block list */
> > @@ -365,8 +611,10 @@ int ram_load(QEMUFile *f, void
> *opaque, int version_id)
> >
> >                     QLIST_FOREACH(block, &ram_list.blocks, next) {
> >                         if (!strncmp(id, block->idstr,
> sizeof(id))) {
> > -                            if (block->length != length)
> > -                                return -EINVAL;
> > +                            if (block->length != length) {
> > +                                ret = -EINVAL;
> > +                                goto done;
> > +                            }
> >                             break;
> >                         }
> >                     }
> > @@ -374,7 +622,8 @@ int ram_load(QEMUFile *f, void *opaque,
> int version_id)
> >                     if (!block) {
> >                         fprintf(stderr, "Unknown ramblock
> \"%s\", cannot "
> >                                 "accept migration\n", id);
> > -                        return -EINVAL;
> > +                        ret = -EINVAL;
> > +                        goto done;
> >                     }
> >
> >                     total_ram_bytes -= length;
> > @@ -383,17 +632,10 @@ int ram_load(QEMUFile *f, void
> *opaque, int version_id)
> >         }
> >
> >         if (flags & RAM_SAVE_FLAG_COMPRESS) {
> > -            void *host;
> >             uint8_t ch;
> >
> > -            if (version_id == 3)
> > -                host = qemu_get_ram_ptr(addr);
> > -            else
> > -                host = host_from_stream_offset(f, addr, flags);
> > -            if (!host) {
> > -                return -EINVAL;
> > -            }
> > -
> > +            host = host_from_stream_offset_versioned(version_id,
> > +                            f, addr, flags);
> >             ch = qemu_get_byte(f);
> >             memset(host, ch, TARGET_PAGE_SIZE);
> >  #ifndef _WIN32
> > @@ -403,21 +645,28 @@ int ram_load(QEMUFile *f, void
> *opaque, int version_id)
> >             }
> >  #endif
> >         } else if (flags & RAM_SAVE_FLAG_PAGE) {
> > -            void *host;
> > -
> > -            if (version_id == 3)
> > -                host = qemu_get_ram_ptr(addr);
> > -            else
> > -                host = host_from_stream_offset(f, addr, flags);
> > -
> > +            host = host_from_stream_offset_versioned(version_id,
> > +                            f, addr, flags);
> >             qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> > +        } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
> > +            host = host_from_stream_offset_versioned(version_id,
> > +                            f, addr, flags);
> > +            if (load_xbrle(f, addr, host) < 0) {
> > +                ret = -EINVAL;
> > +                goto done;
> > +            }
> >         }
> > +
> >         if (qemu_file_has_error(f)) {
> > -            return -EIO;
> > +            ret = -EIO;
> > +            goto done;
> >         }
> >     } while (!(flags & RAM_SAVE_FLAG_EOS));
> >
> > -    return 0;
> > +done:
> > +    DPRINTF("Completed load of VM with exit code %d seq
> iteration %ld\n",
> > +            ret, seq_iter);
> > +    return ret;
> >  }
> >
> >  void qemu_service_io(void)
> > diff --git a/block-migration.c b/block-migration.c
> > index 3e66f49..504df70 100644
> > --- a/block-migration.c
> > +++ b/block-migration.c
> > @@ -689,7 +689,8 @@ static int block_load(QEMUFile *f, void
> *opaque, int version_id)
> >     return 0;
> >  }
> >
> > -static void block_set_params(int blk_enable, int
> shared_base, void *opaque)
> > +static void block_set_params(int blk_enable, int shared_base,
> > +        int use_xbrle, int64_t xbrle_cache_size, void *opaque)
> >  {
> >     block_mig_state.blk_enable = blk_enable;
> >     block_mig_state.shared_base = shared_base;
> > diff --git a/hash.h b/hash.h
> > new file mode 100644
> > index 0000000..54abf7e
> > --- /dev/null
> > +++ b/hash.h
> > @@ -0,0 +1,72 @@
> > +#ifndef _LINUX_HASH_H
> > +#define _LINUX_HASH_H
> > +/* Fast hashing routine for ints,  longs and pointers.
> > +   (C) 2002 William Lee Irwin III, IBM */
> > +
> > +/*
> > + * Knuth recommends primes in approximately golden ratio
> to the maximum
> > + * integer representable by a machine word for
> multiplicative hashing.
> > + * Chuck Lever verified the effectiveness of this technique:
> > + * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
> > + *
> > + * These primes are chosen to be bit-sparse, that is operations on
> > + * them can use shifts and additions instead of multiplications for
> > + * machines where multiplications are slow.
> > + */
> > +
> > +typedef uint64_t u64;
> > +typedef uint32_t u32;
> > +#define BITS_PER_LONG TARGET_LONG_BITS
> > +
> > +/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
> > +#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
> > +/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
> > +#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL
> > +
> > +#if BITS_PER_LONG == 32
> > +#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_32
> > +#define hash_long(val, bits) hash_32(val, bits)
> > +#elif BITS_PER_LONG == 64
> > +#define hash_long(val, bits) hash_64(val, bits)
> > +#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_64
> > +#else
> > +#error Wordsize not 32 or 64
> > +#endif
> > +
> > +static inline u64 hash_64(u64 val, unsigned int bits)
> > +{
> > +       u64 hash = val;
> > +
> > +       /*  Sigh, gcc can't optimise this alone like it
> does for 32 bits. */
> > +       u64 n = hash;
> > +       n <<= 18;
> > +       hash -= n;
> > +       n <<= 33;
> > +       hash -= n;
> > +       n <<= 3;
> > +       hash += n;
> > +       n <<= 3;
> > +       hash -= n;
> > +       n <<= 4;
> > +       hash += n;
> > +       n <<= 2;
> > +       hash += n;
> > +
> > +       /* High bits are more random, so use them. */
> > +       return hash >> (64 - bits);
> > +}
> > +
> > +static inline u32 hash_32(u32 val, unsigned int bits)
> > +{
> > +       /* On some cpus multiply is faster, on others gcc
> will do shifts */
> > +       u32 hash = val * GOLDEN_RATIO_PRIME_32;
> > +
> > +       /* High bits are more random, so use them. */
> > +       return hash >> (32 - bits);
> > +}
> > +
> > +static inline unsigned long hash_ptr(void *ptr, unsigned int bits)
> > +{
> > +       return hash_long((unsigned long)ptr, bits);
> > +}
> > +#endif /* _LINUX_HASH_H */
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > old mode 100644
> > new mode 100755
> > index e5585ba..e49d5be
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -717,24 +717,27 @@ ETEXI
> >
> >     {
> >         .name       = "migrate",
> > -        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> > -        .params     = "[-d] [-b] [-i] uri",
> > -        .help       = "migrate to URI (using -d to not
> wait for completion)"
> > -                     "\n\t\t\t -b for migration without
> shared storage with"
> > -                     " full copy of disk\n\t\t\t -i for
> migration without "
> > -                     "shared storage with incremental copy
> of disk "
> > -                     "(base image shared between src and
> destination)",
> > +        .args_type  = "detach:-d,blk:-b,inc:-i,xbrle:-x,uri:s",
> > +        .params     = "[-d] [-b] [-i] [-x] uri",
> > +        .help       = "migrate to URI"
> > +                      "\n\t -d to not wait for completion"
> > +                      "\n\t -b for migration without
> shared storage with"
> > +                      " full copy of disk"
> > +                      "\n\t -i for migration without"
> > +                      " shared storage with incremental
> copy of disk"
> > +                      " (base image shared between source
> and destination)"
> > +                      "\n\t -x to use XBRLE page delta
> compression",
> >         .user_print = monitor_user_noop,
> >        .mhandler.cmd_new = do_migrate,
> >     },
> >
> > -
> >  STEXI
> > address@hidden migrate [-d] [-b] [-i] @var{uri}
> > address@hidden migrate [-d] [-b] [-i] [-x] @var{uri}
> >  @findex migrate
> >  Migrate to @var{uri} (using -d to not wait for completion).
> >        -b for migration with full copy of disk
> >        -i for migration with incremental copy of disk (base
> image is shared)
> > +    -x to use XBRLE page delta compression
> >  ETEXI
> >
> >     {
> > @@ -753,10 +756,23 @@ Cancel the current VM migration.
> >  ETEXI
> >
> >     {
> > +        .name       = "migrate_set_cachesize",
> > +        .args_type  = "value:s",
> > +        .params     = "value",
> > +        .help       = "set cache size (in MB) for XBRLE
> migrations",
> > +        .mhandler.cmd = do_migrate_set_cachesize,
> > +    },
> > +
> > +STEXI
> > address@hidden migrate_set_cachesize @var{value}
> > +Set cache size (in MB) for xbrle migrations.
> > +ETEXI
> > +
> > +    {
> >         .name       = "migrate_set_speed",
> >         .args_type  = "value:o",
> >         .params     = "value",
> > -        .help       = "set maximum speed (in bytes) for
> migrations. "
> > +        .help       = "set maximum XBRLE cache size (in
> bytes) for migrations. "
> >        "Defaults to MB if no size suffix is specified, ie.
> B/K/M/G/T",
> >         .user_print = monitor_user_noop,
> >         .mhandler.cmd_new = do_migrate_set_speed,
> > diff --git a/hw/hw.h b/hw/hw.h
> > index 9d2cfc2..aa336ec 100644
> > --- a/hw/hw.h
> > +++ b/hw/hw.h
> > @@ -239,7 +239,8 @@ static inline void
> qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
> >  int64_t qemu_ftell(QEMUFile *f);
> >  int64_t qemu_fseek(QEMUFile *f, int64_t pos, int whence);
> >
> > -typedef void SaveSetParamsHandler(int blk_enable, int
> shared, void * opaque);
> > +typedef void SaveSetParamsHandler(int blk_enable, int shared,
> > +        int use_xbrle, int64_t xbrle_cache_size, void *opaque);
> >  typedef void SaveStateHandler(QEMUFile *f, void *opaque);
> >  typedef int SaveLiveStateHandler(Monitor *mon, QEMUFile
> *f, int stage,
> >                                  void *opaque);
> > diff --git a/lru.c b/lru.c
> > new file mode 100644
> > index 0000000..bad65d1
> > --- /dev/null
> > +++ b/lru.c
> > @@ -0,0 +1,151 @@
> > +#include <assert.h>
> > +#include <math.h>
> > +#include "lru.h"
> > +#include "qemu-queue.h"
> > +#include "hash.h"
> > +
> > +typedef struct CacheItem {
> > +    ram_addr_t it_addr;
> > +    uint8_t *it_data;
> > +    lru_free_cb_t it_free;
> > +    QCIRCLEQ_ENTRY(CacheItem) it_lru_next;
> > +    QCIRCLEQ_ENTRY(CacheItem) it_bucket_next;
> > +} CacheItem;
> > +
> > +typedef QCIRCLEQ_HEAD(, CacheItem) CacheBucket;
> > +static CacheBucket *page_hash;
> > +static int64_t cache_table_size;
> > +static uint64_t cache_max_items;
> > +static int64_t cache_num_items;
> > +static uint8_t cache_hash_bits;
> > +
> > +static QCIRCLEQ_HEAD(page_lru, CacheItem) page_lru;
> > +
> > +static uint64_t next_pow_of_2(uint64_t v)
> > +{
> > +    v--;
> > +    v |= v >> 1;
> > +    v |= v >> 2;
> > +    v |= v >> 4;
> > +    v |= v >> 8;
> > +    v |= v >> 16;
> > +    v |= v >> 32;
> > +    v++;
> > +    return v;
> > +}
> > +
> > +static uint8_t count_hash_bits(uint64_t v)
> > +{
> > +    uint8_t bits = 0;
> > +
> > +    while (!(v & 1)) {
> > +        v = v >> 1;
> > +        bits++;
> > +    }
> > +    return bits;
> > +}
>
> I think we have clz() which could be used.
>
> > +
> > +void lru_init(int64_t max_items, void *param)
> > +{
> > +    int i;
> > +
> > +    cache_num_items = 0;
> > +    cache_max_items = max_items;
> > +    /* add 20% to table size to reduce collisions */
> > +    cache_table_size = next_pow_of_2(1.2 * max_items);
> > +    cache_hash_bits = count_hash_bits(cache_table_size);
> > +
> > +    QCIRCLEQ_INIT(&page_lru);
> > +
> > +    page_hash = qemu_mallocz(sizeof(CacheBucket) *
> cache_table_size);
> > +    assert(page_hash);
> > +    for (i = 0; i < cache_table_size; i++) {
> > +        QCIRCLEQ_INIT(&page_hash[i]);
> > +    }
> > +}
> > +
> > +static CacheBucket *page_bucket_list(ram_addr_t addr)
> > +{
> > +    return &page_hash[hash_long(addr, cache_hash_bits)];
> > +}
> > +
> > +static void do_lru_remove(CacheItem *it)
> > +{
> > +    assert(it);
> > +
> > +    QCIRCLEQ_REMOVE(&page_lru, it, it_lru_next);
> > +    QCIRCLEQ_REMOVE(page_bucket_list(it->it_addr), it,
> it_bucket_next);
> > +    if (it->it_free) {
> > +        (*it->it_free)(it->it_data);
> > +    }
> > +    qemu_free(it);
> > +    cache_num_items--;
> > +}
> > +
> > +static int do_lru_remove_first(void)
> > +{
> > +    CacheItem *first;
> > +
> > +    if (QCIRCLEQ_EMPTY(&page_lru)) {
> > +        return -1;
> > +    }
> > +    first = QCIRCLEQ_FIRST(&page_lru);
> > +    do_lru_remove(first);
> > +    return 0;
> > +}
> > +
> > +
> > +void lru_fini(void)
> > +{
> > +    while (!do_lru_remove_first())
> > +    ;
>
> Braces, indentation.
>
> > +    qemu_free(page_hash);
> > +}
> > +
> > +static CacheItem *do_lru_lookup(ram_addr_t addr)
> > +{
> > +    CacheBucket *head = page_bucket_list(addr);
> > +    CacheItem *it;
> > +
> > +    if (QCIRCLEQ_EMPTY(head)) {
> > +        return NULL;
> > +    }
> > +    QCIRCLEQ_FOREACH(it, head, it_bucket_next) {
> > +        if (addr == it->it_addr) {
> > +            return it;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +uint8_t *lru_lookup(ram_addr_t addr)
> > +{
> > +    CacheItem *it = do_lru_lookup(addr);
> > +    return it ? it->it_data : NULL;
> > +}
> > +
> > +void lru_insert(ram_addr_t addr, uint8_t *data,
> lru_free_cb_t free_cb)
> > +{
> > +    CacheItem *it;
> > +
> > +    /* remove old if item exists */
> > +    it = do_lru_lookup(addr);
> > +    if (it) {
> > +        do_lru_remove(it);
> > +    }
> > +
> > +    /* evict LRU if require free space */
> > +    if (cache_num_items == cache_max_items) {
> > +        do_lru_remove_first();
> > +    }
> > +
> > +    /* add new entry */
> > +    it = qemu_mallocz(sizeof(*it));
> > +    it->it_addr = addr;
> > +    it->it_data = data;
> > +    it->it_free = free_cb;
> > +    QCIRCLEQ_INSERT_HEAD(page_bucket_list(addr), it,
> it_bucket_next);
> > +    QCIRCLEQ_INSERT_TAIL(&page_lru, it, it_lru_next);
> > +    cache_num_items++;
> > +}
> > +
> > diff --git a/lru.h b/lru.h
> > new file mode 100644
> > index 0000000..6c70095
> > --- /dev/null
> > +++ b/lru.h
> > @@ -0,0 +1,13 @@
> > +#ifndef _LRU_H_
> > +#define _LRU_H_
> > +
> > +#include <unistd.h>
> > +#include <stdint.h>
> > +#include "cpu-all.h"
> > +typedef void (*lru_free_cb_t)(void *);
> > +void lru_init(ssize_t num_items, void *param);
> > +void lru_fini(void);
> > +void lru_insert(ram_addr_t id, uint8_t *pdata,
> lru_free_cb_t free_cb);
> > +uint8_t *lru_lookup(ram_addr_t addr);
> > +#endif
> > +
> > diff --git a/migration-exec.c b/migration-exec.c
> > index 14718dd..fe8254a 100644
> > --- a/migration-exec.c
> > +++ b/migration-exec.c
> > @@ -67,7 +67,9 @@ MigrationState
> *exec_start_outgoing_migration(Monitor *mon,
> >                                              int64_t
> bandwidth_limit,
> >                                              int detach,
> >                                              int blk,
> > -                                             int inc)
> > +                          int inc,
> > +                          int use_xbrle,
> > +                          int64_t xbrle_cache_size)
> >  {
> >     FdMigrationState *s;
> >     FILE *f;
> > @@ -99,6 +101,8 @@ MigrationState
> *exec_start_outgoing_migration(Monitor *mon,
> >
> >     s->mig_state.blk = blk;
> >     s->mig_state.shared = inc;
> > +    s->mig_state.use_xbrle = use_xbrle;
> > +    s->mig_state.xbrle_cache_size = xbrle_cache_size;
> >
> >     s->state = MIG_STATE_ACTIVE;
> >     s->mon = NULL;
> > diff --git a/migration-fd.c b/migration-fd.c
> > index 6d14505..4a1ddbd 100644
> > --- a/migration-fd.c
> > +++ b/migration-fd.c
> > @@ -56,7 +56,9 @@ MigrationState
> *fd_start_outgoing_migration(Monitor *mon,
> >                                            int64_t bandwidth_limit,
> >                                            int detach,
> >                                            int blk,
> > -                                           int inc)
> > +                        int inc,
> > +                        int use_xbrle,
> > +                        int64_t xbrle_cache_size)
> >  {
> >     FdMigrationState *s;
> >
> > @@ -82,6 +84,8 @@ MigrationState
> *fd_start_outgoing_migration(Monitor *mon,
> >
> >     s->mig_state.blk = blk;
> >     s->mig_state.shared = inc;
> > +    s->mig_state.use_xbrle = use_xbrle;
> > +    s->mig_state.xbrle_cache_size = xbrle_cache_size;
> >
> >     s->state = MIG_STATE_ACTIVE;
> >     s->mon = NULL;
> > diff --git a/migration-tcp.c b/migration-tcp.c
> > index b55f419..4ca5bf6 100644
> > --- a/migration-tcp.c
> > +++ b/migration-tcp.c
> > @@ -81,7 +81,9 @@ MigrationState
> *tcp_start_outgoing_migration(Monitor *mon,
> >                                              int64_t
> bandwidth_limit,
> >                                              int detach,
> >                                             int blk,
> > -                                            int inc)
> > +                         int inc,
> > +                         int use_xbrle,
> > +                         int64_t xbrle_cache_size)
> >  {
> >     struct sockaddr_in addr;
> >     FdMigrationState *s;
> > @@ -101,6 +103,8 @@ MigrationState
> *tcp_start_outgoing_migration(Monitor *mon,
> >
> >     s->mig_state.blk = blk;
> >     s->mig_state.shared = inc;
> > +    s->mig_state.use_xbrle = use_xbrle;
> > +    s->mig_state.xbrle_cache_size = xbrle_cache_size;
> >
> >     s->state = MIG_STATE_ACTIVE;
> >     s->mon = NULL;
> > diff --git a/migration-unix.c b/migration-unix.c
> > index 57232c0..0813902 100644
> > --- a/migration-unix.c
> > +++ b/migration-unix.c
> > @@ -80,7 +80,9 @@ MigrationState
> *unix_start_outgoing_migration(Monitor *mon,
> >                                              int64_t
> bandwidth_limit,
> >                                              int detach,
> >                                              int blk,
> > -                                             int inc)
> > +                          int inc,
> > +                          int use_xbrle,
> > +                          int64_t xbrle_cache_size)
> >  {
> >     FdMigrationState *s;
> >     struct sockaddr_un addr;
> > @@ -100,6 +102,8 @@ MigrationState
> *unix_start_outgoing_migration(Monitor *mon,
> >
> >     s->mig_state.blk = blk;
> >     s->mig_state.shared = inc;
> > +    s->mig_state.use_xbrle = use_xbrle;
> > +    s->mig_state.xbrle_cache_size = xbrle_cache_size;
> >
> >     s->state = MIG_STATE_ACTIVE;
> >     s->mon = NULL;
> > diff --git a/migration.c b/migration.c
> > old mode 100644
> > new mode 100755
> > index 9ee8b17..ccacf81
> > --- a/migration.c
> > +++ b/migration.c
> > @@ -34,6 +34,11 @@
> >  /* Migration speed throttling */
> >  static uint32_t max_throttle = (32 << 20);
> >
> > +/* Migration XBRLE cache size */
> > +#define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
> > +
> > +static int64_t migrate_cache_size = DEFAULT_MIGRATE_CACHE_SIZE;
> > +
> >  static MigrationState *current_migration;
> >
> >  int qemu_start_incoming_migration(const char *uri)
> > @@ -80,6 +85,7 @@ int do_migrate(Monitor *mon, const QDict
> *qdict, QObject **ret_data)
> >     int detach = qdict_get_try_bool(qdict, "detach", 0);
> >     int blk = qdict_get_try_bool(qdict, "blk", 0);
> >     int inc = qdict_get_try_bool(qdict, "inc", 0);
> > +    int use_xbrle = qdict_get_try_bool(qdict, "xbrle", 0);
> >     const char *uri = qdict_get_str(qdict, "uri");
> >
> >     if (current_migration &&
> > @@ -90,17 +96,21 @@ int do_migrate(Monitor *mon, const
> QDict *qdict, QObject **ret_data)
> >
> >     if (strstart(uri, "tcp:", &p)) {
> >         s = tcp_start_outgoing_migration(mon, p,
> max_throttle, detach,
> > -                                         blk, inc);
> > +                                         blk, inc, use_xbrle,
> > +                                         migrate_cache_size);
> >  #if !defined(WIN32)
> >     } else if (strstart(uri, "exec:", &p)) {
> >         s = exec_start_outgoing_migration(mon, p,
> max_throttle, detach,
> > -                                          blk, inc);
> > +                                          blk, inc, use_xbrle,
> > +                                          migrate_cache_size);
> >     } else if (strstart(uri, "unix:", &p)) {
> >         s = unix_start_outgoing_migration(mon, p,
> max_throttle, detach,
> > -                                          blk, inc);
> > +                                          blk, inc, use_xbrle,
> > +                                          migrate_cache_size);
> >     } else if (strstart(uri, "fd:", &p)) {
> >         s = fd_start_outgoing_migration(mon, p,
> max_throttle, detach,
> > -                                        blk, inc);
> > +                                        blk, inc, use_xbrle,
> > +                                        migrate_cache_size);
> >  #endif
> >     } else {
> >         monitor_printf(mon, "unknown migration protocol:
> %s\n", uri);
> > @@ -185,6 +195,36 @@ static void
> migrate_print_status(Monitor *mon, const char *name,
> >                         qdict_get_int(qdict, "total") >> 10);
> >  }
> >
> > +static void migrate_print_ram_status(Monitor *mon, const
> char *name,
> > +                                 const QDict *status_dict)
> > +{
> > +    QDict *qdict;
> > +    uint64_t overflow, cache_hit, cache_lookup;
> > +
> > +    qdict = qobject_to_qdict(qdict_get(status_dict, name));
> > +
> > +    monitor_printf(mon, "transferred %s: %" PRIu64 "
> kbytes\n", name,
> > +                        qdict_get_int(qdict, "bytes") >> 10);
> > +    monitor_printf(mon, "transferred %s: %" PRIu64 "
> pages\n", name,
> > +                        qdict_get_int(qdict, "pages"));
> > +    overflow = qdict_get_int(qdict, "overflow");
> > +    if (overflow > 0) {
> > +        monitor_printf(mon, "overflow %s: %" PRIu64 "
> pages\n", name,
> > +            overflow);
> > +    }
> > +    cache_hit = qdict_get_int(qdict, "cache-hit");
> > +    if (cache_hit > 0) {
> > +        monitor_printf(mon, "cache-hit %s: %" PRIu64 "
> pages\n", name,
> > +            cache_hit);
> > +    }
> > +    cache_lookup = qdict_get_int(qdict, "cache-lookup");
> > +    if (cache_lookup > 0) {
> > +        monitor_printf(mon, "cache-lookup %s: %" PRIu64 "
> pages\n", name,
> > +            cache_lookup);
> > +    }
> > +
> > +}
> > +
> >  void do_info_migrate_print(Monitor *mon, const QObject *data)
> >  {
> >     QDict *qdict;
> > @@ -198,6 +238,18 @@ void do_info_migrate_print(Monitor
> *mon, const QObject *data)
> >         migrate_print_status(mon, "ram", qdict);
> >     }
> >
> > +    if (qdict_haskey(qdict, "ram-duplicate")) {
> > +        migrate_print_ram_status(mon, "ram-duplicate", qdict);
> > +    }
> > +
> > +    if (qdict_haskey(qdict, "ram-normal")) {
> > +        migrate_print_ram_status(mon, "ram-normal", qdict);
> > +    }
> > +
> > +    if (qdict_haskey(qdict, "ram-xbrle")) {
> > +        migrate_print_ram_status(mon, "ram-xbrle", qdict);
> > +    }
> > +
> >     if (qdict_haskey(qdict, "disk")) {
> >         migrate_print_status(mon, "disk", qdict);
> >     }
> > @@ -214,6 +266,23 @@ static void migrate_put_status(QDict
> *qdict, const char *name,
> >     qdict_put_obj(qdict, name, obj);
> >  }
> >
> > +static void migrate_put_ram_status(QDict *qdict, const char *name,
> > +                               uint64_t bytes, uint64_t pages,
> > +                               uint64_t overflow, uint64_t
> cache_hit,
> > +                               uint64_t cache_lookup)
> > +{
> > +    QObject *obj;
> > +
> > +    obj = qobject_from_jsonf("{ 'bytes': %" PRId64 ", "
> > +                               "'pages': %" PRId64 ", "
> > +                               "'overflow': %" PRId64 ", "
> > +                               "'cache-hit': %" PRId64 ", "
> > +                               "'cache-lookup': %" PRId64 " }",
> > +                               bytes, pages, overflow, cache_hit,
> > +                               cache_lookup);
> > +    qdict_put_obj(qdict, name, obj);
> > +}
> > +
> >  void do_info_migrate(Monitor *mon, QObject **ret_data)
> >  {
> >     QDict *qdict;
> > @@ -228,6 +297,21 @@ void do_info_migrate(Monitor *mon,
> QObject **ret_data)
> >             migrate_put_status(qdict, "ram",
> ram_bytes_transferred(),
> >                                ram_bytes_remaining(),
> ram_bytes_total());
> >
> > +            if (s->use_xbrle) {
> > +                migrate_put_ram_status(qdict, "ram-duplicate",
> > +                                   dup_mig_bytes_transferred(),
> > +
> dup_mig_pages_transferred(), 0, 0, 0);
> > +                migrate_put_ram_status(qdict, "ram-normal",
> > +                                   norm_mig_bytes_transferred(),
> > +
> norm_mig_pages_transferred(), 0, 0, 0);
> > +                migrate_put_ram_status(qdict, "ram-xbrle",
> > +                                   xbrle_mig_bytes_transferred(),
> > +                                   xbrle_mig_pages_transferred(),
> > +                                   xbrle_mig_pages_overflow(),
> > +                                   xbrle_mig_pages_cache_hit(),
> > +                                   xbrle_mig_pages_cache_lookup());
> > +            }
> > +
> >             if (blk_mig_active()) {
> >                 migrate_put_status(qdict, "disk",
> blk_mig_bytes_transferred(),
> >                                    blk_mig_bytes_remaining(),
> > @@ -341,7 +425,8 @@ void migrate_fd_connect(FdMigrationState *s)
> >
> >     DPRINTF("beginning savevm\n");
> >     ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
> > -                                  s->mig_state.shared);
> > +                                  s->mig_state.shared,
> s->mig_state.use_xbrle,
> > +                                  s->mig_state.xbrle_cache_size);
> >     if (ret < 0) {
> >         DPRINTF("failed, %d\n", ret);
> >         migrate_fd_error(s);
> > @@ -448,3 +533,27 @@ int migrate_fd_close(void *opaque)
> >     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> >     return s->close(s);
> >  }
> > +
> > +void do_migrate_set_cachesize(Monitor *mon, const QDict *qdict)
> > +{
> > +    ssize_t bytes;
> > +    const char *value = qdict_get_str(qdict, "value");
> > +
> > +    bytes = strtosz(value, NULL);
> > +    if (bytes < 0) {
> > +        monitor_printf(mon, "invalid cache size: %s\n", value);
> > +        return;
> > +    }
> > +
> > +    /* On 32-bit hosts, QEMU is limited by virtual address space */
> > +    if (bytes > (2047 << 20) && HOST_LONG_BITS == 32) {
> > +        monitor_printf(mon, "cache can't exceed 2047 MB
> RAM limit on host\n");
> > +        return;
> > +    }
> > +    if (bytes != (uint64_t) bytes) {
> > +        monitor_printf(mon, "cache size too large\n");
> > +        return;
> > +    }
> > +    migrate_cache_size = bytes;
> > +}
> > +
> > diff --git a/migration.h b/migration.h
> > index d13ed4f..6dc0543 100644
> > --- a/migration.h
> > +++ b/migration.h
> > @@ -32,6 +32,8 @@ struct MigrationState
> >     void (*release)(MigrationState *s);
> >     int blk;
> >     int shared;
> > +    int use_xbrle;
> > +    int64_t xbrle_cache_size;
> >  };
> >
> >  typedef struct FdMigrationState FdMigrationState;
> > @@ -76,7 +78,9 @@ MigrationState
> *exec_start_outgoing_migration(Monitor *mon,
> >                                              int64_t
> bandwidth_limit,
> >                                              int detach,
> >                                              int blk,
> > -                                             int inc);
> > +                          int inc,
> > +                          int use_xbrle,
> > +                          int64_t xbrle_cache_size);
> >
> >  int tcp_start_incoming_migration(const char *host_port);
> >
> > @@ -85,7 +89,9 @@ MigrationState
> *tcp_start_outgoing_migration(Monitor *mon,
> >                                             int64_t bandwidth_limit,
> >                                             int detach,
> >                                             int blk,
> > -                                            int inc);
> > +                         int inc,
> > +                         int use_xbrle,
> > +                         int64_t xbrle_cache_size);
> >
> >  int unix_start_incoming_migration(const char *path);
> >
> > @@ -94,7 +100,9 @@ MigrationState
> *unix_start_outgoing_migration(Monitor *mon,
> >                                              int64_t
> bandwidth_limit,
> >                                              int detach,
> >                                              int blk,
> > -                                             int inc);
> > +                          int inc,
> > +                          int use_xbrle,
> > +                          int64_t xbrle_cache_size);
> >
> >  int fd_start_incoming_migration(const char *path);
> >
> > @@ -103,7 +111,9 @@ MigrationState
> *fd_start_outgoing_migration(Monitor *mon,
> >                                            int64_t bandwidth_limit,
> >                                            int detach,
> >                                            int blk,
> > -                                           int inc);
> > +                        int inc,
> > +                        int use_xbrle,
> > +                        int64_t xbrle_cache_size);
> >
> >  void migrate_fd_monitor_suspend(FdMigrationState *s, Monitor *mon);
> >
> > @@ -134,4 +144,11 @@ static inline FdMigrationState
> *migrate_to_fms(MigrationState *mig_state)
> >     return container_of(mig_state, FdMigrationState, mig_state);
> >  }
> >
> > +void do_migrate_set_cachesize(Monitor *mon, const QDict *qdict);
> > +
> > +void arch_set_params(int blk_enable, int shared_base,
> > +        int use_xbrle, int64_t xbrle_cache_size, void *opaque);
> > +
> > +int xbrle_mig_active(void);
> > +
> >  #endif
> > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > index 793cf1c..8fbe64b 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -431,13 +431,16 @@ EQMP
> >
> >     {
> >         .name       = "migrate",
> > -        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> > -        .params     = "[-d] [-b] [-i] uri",
> > -        .help       = "migrate to URI (using -d to not
> wait for completion)"
> > -                     "\n\t\t\t -b for migration without
> shared storage with"
> > -                     " full copy of disk\n\t\t\t -i for
> migration without "
> > -                     "shared storage with incremental copy
> of disk "
> > -                     "(base image shared between src and
> destination)",
> > +        .args_type  = "detach:-d,blk:-b,inc:-i,xbrle:-x,uri:s",
> > +        .params     = "[-d] [-b] [-i] [-x] uri",
> > +        .help       = "migrate to URI"
> > +                      "\n\t -d to not wait for completion"
> > +                      "\n\t -b for migration without
> shared storage with"
> > +                      " full copy of disk"
> > +                      "\n\t -i for migration without"
> > +                      " shared storage with incremental
> copy of disk"
> > +                      " (base image shared between source
> and destination)"
> > +                      "\n\t -x to use XBRLE page delta
> compression",
> >         .user_print = monitor_user_noop,
> >        .mhandler.cmd_new = do_migrate,
> >     },
> > @@ -453,6 +456,7 @@ Arguments:
> >  - "blk": block migration, full disk copy (json-bool, optional)
> >  - "inc": incremental disk copy (json-bool, optional)
> >  - "uri": Destination URI (json-string)
> > +- "xbrle": to use XBRLE page delta compression
> >
> >  Example:
> >
> > @@ -494,6 +498,31 @@ Example:
> >  EQMP
> >
> >     {
> > +        .name       = "migrate_set_cachesize",
> > +        .args_type  = "value:s",
> > +        .params     = "value",
> > +        .help       = "set cache size (in MB) for xbrle
> migrations",
> > +        .mhandler.cmd = do_migrate_set_cachesize,
> > +    },
> > +
> > +SQMP
> > +migrate_set_cachesize
> > +---------------------
> > +
> > +Set cache size to be used by XBRLE migration
> > +
> > +Arguments:
> > +
> > +- "value": cache size in bytes (json-number)
> > +
> > +Example:
> > +
> > +-> { "execute": "migrate_set_cachesize", "arguments": {
> "value": 500M } }
> > +<- { "return": {} }
> > +
> > +EQMP
> > +
> > +    {
> >         .name       = "migrate_set_speed",
> >         .args_type  = "value:f",
> >         .params     = "value",
> > diff --git a/savevm.c b/savevm.c
> > index 4e49765..93b512b 100644
> > --- a/savevm.c
> > +++ b/savevm.c
> > @@ -1141,7 +1141,8 @@ int register_savevm(DeviceState *dev,
> >                     void *opaque)
> >  {
> >     return register_savevm_live(dev, idstr, instance_id, version_id,
> > -                                NULL, NULL, save_state,
> load_state, opaque);
> > +                                arch_set_params, NULL, save_state,
> > +                                load_state, opaque);
> >  }
> >
> >  void unregister_savevm(DeviceState *dev, const char
> *idstr, void *opaque)
> > @@ -1428,15 +1429,17 @@ static int vmstate_save(QEMUFile
> *f, SaveStateEntry *se)
> >  #define QEMU_VM_SUBSECTION           0x05
> >
> >  int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int
> blk_enable,
> > -                            int shared)
> > +                            int shared, int use_xbrle,
> > +                            int64_t xbrle_cache_size)
> >  {
> >     SaveStateEntry *se;
> >
> >     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
> >         if(se->set_params == NULL) {
> >             continue;
> > -       }
> > -       se->set_params(blk_enable, shared, se->opaque);
> > +        }
> > +        se->set_params(blk_enable, shared, use_xbrle,
> xbrle_cache_size,
> > +                se->opaque);
> >     }
> >
> >     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
> > @@ -1577,7 +1580,7 @@ static int qemu_savevm_state(Monitor
> *mon, QEMUFile *f)
> >
> >     bdrv_flush_all();
> >
> > -    ret = qemu_savevm_state_begin(mon, f, 0, 0);
> > +    ret = qemu_savevm_state_begin(mon, f, 0, 0, 0, 0);
> >     if (ret < 0)
> >         goto out;
> >
> > diff --git a/sysemu.h b/sysemu.h
> > index b81a70e..eb53bf7 100644
> > --- a/sysemu.h
> > +++ b/sysemu.h
> > @@ -44,6 +44,16 @@ uint64_t ram_bytes_remaining(void);
> >  uint64_t ram_bytes_transferred(void);
> >  uint64_t ram_bytes_total(void);
> >
> > +uint64_t dup_mig_bytes_transferred(void);
> > +uint64_t dup_mig_pages_transferred(void);
> > +uint64_t norm_mig_bytes_transferred(void);
> > +uint64_t norm_mig_pages_transferred(void);
> > +uint64_t xbrle_mig_bytes_transferred(void);
> > +uint64_t xbrle_mig_pages_transferred(void);
> > +uint64_t xbrle_mig_pages_overflow(void);
> > +uint64_t xbrle_mig_pages_cache_lookup(void);
> > +uint64_t xbrle_mig_pages_cache_hit(void);
> > +
> >  int64_t cpu_get_ticks(void);
> >  void cpu_enable_ticks(void);
> >  void cpu_disable_ticks(void);
> > @@ -74,7 +84,8 @@ void qemu_announce_self(void);
> >  void main_loop_wait(int nonblocking);
> >
> >  int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int
> blk_enable,
> > -                            int shared);
> > +                            int shared, int use_xbrle,
> > +                            int64_t xbrle_cache_size);
> >  int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
> >  int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
> >  void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
> > diff --git a/xbzrle.c b/xbzrle.c
> > new file mode 100644
> > index 0000000..4bfd4e5
> > --- /dev/null
> > +++ b/xbzrle.c
> > @@ -0,0 +1,125 @@
> > +#include <stdint.h>
> > +#include <string.h>
> > +#include <assert.h>
> > +#include "cpu-all.h"
> > +#include "xbzrle.h"
> > +
> > +typedef struct {
> > +    uint64_t c;
> > +    uint64_t num;
> > +} zero_encoding_t;
> > +
> > +typedef struct {
> > +    uint64_t c;
> > +} char_encoding_t;
> > +
> > +static int rle_encode(uint64_t *in, int slen, uint8_t
> *out, const int dlen)
> > +{
> > +    int dl = 0;
> > +    uint64_t cp = 0, c, run_len = 0;
> > +
> > +    if (slen <=  0)
> > +        return -1;
> > +
> > +    while (1) {
> > +        if (!slen)
> > +            break;
> > +        c = *in++;
> > +        slen--;
> > +        if (!(cp || c)) {
> > +            run_len++;
> > +        } else if (!cp) {
> > +            ((zero_encoding_t *)out)->c = cp;
>
> This looks like it could produce different results on LE and BE hosts.

this will not work if sever and receiver don't have the same endianess - rather 
than converting all to BE which would slow down XBZRLE encoding I added a new 
header magic field which serves as indicator that a byte-order switch occurred 
-if so then I correct before XBZRLE decoding on the receiver side.

>
> > +            ((zero_encoding_t *)out)->num = run_len;
> > +            dl += sizeof(zero_encoding_t);
> > +            out += sizeof(zero_encoding_t);
> > +            run_len = 1;
> > +        } else {
> > +            ((char_encoding_t *)out)->c = cp;
> > +            dl += sizeof(char_encoding_t);
> > +            out += sizeof(char_encoding_t);
> > +                }
> > +        cp = c;
> > +    }
> > +
> > +    if (!cp) {
> > +        ((zero_encoding_t *)out)->c = cp;
> > +        ((zero_encoding_t *)out)->num = run_len;
> > +        dl += sizeof(zero_encoding_t);
> > +        out += sizeof(zero_encoding_t);
> > +    } else {
> > +        ((char_encoding_t *)out)->c = cp;
> > +        dl += sizeof(char_encoding_t);
> > +        out += sizeof(char_encoding_t);
> > +    }
> > +    return dl;
> > +}
> > +
> > +static int rle_decode(const uint8_t *in, int slen,
> uint64_t *out, int dlen)
> > +{
> > +    int tb = 0;
> > +    uint64_t run_len, c;
> > +
> > +    while (slen > 0) {
> > +        c = ((char_encoding_t *) in)->c;
> > +        if (c) {
> > +            slen -= sizeof(char_encoding_t);
> > +            in += sizeof(char_encoding_t);
> > +            *out++ = c;
> > +            tb++;
> > +            continue;
> > +        }
> > +        run_len = ((zero_encoding_t *) in)->num;
> > +        slen -= sizeof(zero_encoding_t);
> > +        in += sizeof(zero_encoding_t);
> > +        while (run_len-- > 0) {
> > +            *out++ = c;
> > +            tb++;
> > +        }
> > +    }
> > +    return tb;
> > +}
> > +
> > +static void xor_encode_word(uint8_t *dst, const uint8_t *src1,
> > +    const uint8_t *src2)
> > +{
> > +    int len = TARGET_PAGE_SIZE / sizeof (uint64_t);
> > +    uint64_t *dstw = (uint64_t *) dst;
> > +    const uint64_t *srcw1 = (const uint64_t *) src1;
> > +    const uint64_t *srcw2 = (const uint64_t *) src2;
> > +
> > +    while (len--) {
> > +        *dstw++ = *srcw1++ ^ *srcw2++;
> > +    }
> > +}
> > +
> > +static uint8_t xor_buf[TARGET_PAGE_SIZE];
> > +static uint8_t xbzrle_buf[TARGET_PAGE_SIZE * 2];
> > +
> > +int xbzrle_encode(uint8_t *xbzrle, const uint8_t *old,
> const uint8_t *curr,
> > +    const size_t max_compressed_len)
> > +{
> > +    int compressed_len;
> > +
> > +    xor_encode_word(xor_buf, old, curr);
> > +    compressed_len = rle_encode((uint64_t *)xor_buf,
> > +        sizeof(xor_buf)/sizeof(uint64_t), xbzrle_buf,
> > +        sizeof(xbzrle_buf));
> > +    if (compressed_len > max_compressed_len) {
> > +        return -1;
> > +    }
> > +    memcpy(xbzrle, xbzrle_buf, compressed_len);
> > +    return compressed_len;
> > +}
> > +
> > +int xbzrle_decode(uint8_t *curr, const uint8_t *old, const
> uint8_t *xbrle,
> > +    const size_t compressed_len)
> > +{
> > +    int len = rle_decode(xbrle, compressed_len,
> > +         (uint64_t *)xor_buf, sizeof(xor_buf)/sizeof(uint64_t));
> > +    if (len < 0) {
> > +        return len;
> > +    }
> > +    xor_encode_word(curr, old, xor_buf);
> > +    return len * sizeof(uint64_t);
> > +}
> > diff --git a/xbzrle.h b/xbzrle.h
> > new file mode 100644
> > index 0000000..dde7366
> > --- /dev/null
> > +++ b/xbzrle.h
> > @@ -0,0 +1,12 @@
> > +#ifndef _XBZRLE_H_
> > +#define _XBZRLE_H_
> > +
> > +#include <stdio.h>
> > +
> > +int xbzrle_encode(uint8_t *xbrle, const uint8_t *old,
> const uint8_t *curr,
> > +       const size_t len);
> > +int xbzrle_decode(uint8_t *curr, const uint8_t *old, const
> uint8_t *xbrle,
> > +       const size_t len);
> > +
> > +#endif
> > +
> >
> >
>
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, (continued)
- Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Alexander Graf, 2011/08/02
  - Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Paolo Bonzini, 2011/08/02
  - Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Anthony Liguori, 2011/08/02
    - Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Shribman, Aidan, 2011/08/04
  - Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Stefan Hajnoczi, 2011/08/02
  - Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Avi Kivity, 2011/08/02
- Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Stefan Hajnoczi, 2011/08/02
  - Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Shribman, Aidan, 2011/08/08
    - Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Stefan Hajnoczi, 2011/08/08
- Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Blue Swirl, 2011/08/02
  - Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps, Shribman, Aidan <=
Prev by Date: [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
Next by Date: Re: [Qemu-devel] [PATCH 2/3] Introduce a 'client_add' monitor command accepting an open FD
Previous by thread: Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps
Next by thread: [Qemu-devel] [PATCH v2 10/10] block: Use bdrv_co_* instead of synchronous versions in coroutines
Index(es):
- Date
- Thread