qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 09/11] dump: Add API to write dump pages


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH v6 09/11] dump: Add API to write dump pages
Date: Tue, 07 Jan 2014 23:37:45 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131118 Thunderbird/17.0.11

comments below

On 01/05/14 08:27, Qiao Nuohan wrote:
> functions are used to write page to vmcore. vmcore is written page by page.
> page desc is used to store the information of a page, including a page's size,
> offset, compression format, etc.
> 
> Signed-off-by: Qiao Nuohan <address@hidden>
> ---
>  dump.c                |  258 
> +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/sysemu/dump.h |    9 ++
>  2 files changed, 267 insertions(+), 0 deletions(-)
> 
> diff --git a/dump.c b/dump.c
> index b4c40f2..848957c 100644
> --- a/dump.c
> +++ b/dump.c
> @@ -25,6 +25,14 @@
>  #include "qapi/error.h"
>  #include "qmp-commands.h"
>  
> +#include <zlib.h>
> +#ifdef CONFIG_LZO
> +#include <lzo/lzo1x.h>
> +#endif
> +#ifdef CONFIG_SNAPPY
> +#include <snappy-c.h>
> +#endif
> +
>  static uint16_t cpu_convert_to_target16(uint16_t val, int endian)
>  {
>      if (endian == ELFDATA2LSB) {
> @@ -1140,6 +1148,256 @@ static void free_data_cache(DataCache *data_cache)
>      g_free(data_cache->buf);
>  }
>  
> +static size_t get_len_buf_out(size_t page_size, uint32_t flag_compress)
> +{
> +    size_t len_buf_out_zlib, len_buf_out_lzo, len_buf_out_snappy;
> +    size_t len_buf_out;
> +
> +    /* init buf_out */
> +    len_buf_out_zlib = len_buf_out_lzo = len_buf_out_snappy = 0;
> +
> +    /* buf size for zlib */
> +    len_buf_out_zlib = compressBound(page_size);
> +
> +    /* buf size for lzo */
> +#ifdef CONFIG_LZO
> +    if (flag_compress & DUMP_DH_COMPRESSED_LZO) {
> +        if (lzo_init() != LZO_E_OK) {
> +            /* return 0 to indicate lzo is unavailable */
> +            return 0;
> +        }
> +    }
> +
> +    /*
> +     * LZO will expand incompressible data by a little amount. please check 
> the
> +     * following URL to see the expansion calculation:
> +     * http://www.oberhumer.com/opensource/lzo/lzofaq.php
> +     */
> +    len_buf_out_lzo = page_size + page_size / 16 + 64 + 3;
> +#endif
> +
> +#ifdef CONFIG_SNAPPY
> +    /* buf size for snappy */
> +    len_buf_out_snappy = snappy_max_compressed_length(page_size);
> +#endif
> +
> +    /* get the biggest that can store all kinds of compressed page */
> +    len_buf_out = MAX(len_buf_out_zlib,
> +                      MAX(len_buf_out_lzo, len_buf_out_snappy));
> +
> +    return len_buf_out;
> +}

seems OK

> +
> +/*
> + * search memory blocks to find the exact place of the specified page, then
> + * dump the page into buf. memory should be read page by page, or it may 
> exceed
> + * the boundary and fail to read
> + */
> +static int readmem(void *bufptr, ram_addr_t addr, size_t size, DumpState *s)

Apologies, but I think this does warrant a respin, even though what I'm
going to say is kind of ridiculous...

The type of "addr" needs to be "hwaddr" here, not "ram_addr_t". The
latter is for subscripting RAMBlocks. The former (hwaddr) is what you
need when poking in guest-phys address space. See also "2.1. Scalars" in
HACKING.

I'm not worried about overflowing ram_addr_t or anything like that, it's
about the message ram_addr_t sends -- it's precisely the wrong type
here. (I guess ram_addr_t here could be a remnant from an early version
of your series, one that predated the introduction of guest_phys_blocks
in dump.)

> +{
> +    GuestPhysBlock *block;
> +
> +    QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
> +        if ((addr >= block->target_start) &&
> +            (addr + size <= block->target_end)) {
> +            memcpy(bufptr, block->host_addr + (addr - block->target_start),
> +                    size);
> +            return 0;
> +        }
> +    }
> +
> +    return -1;
> +}

I also wonder if you could speed this up by returning a pointer instead
of copying out the data. But that's tangential.

> +
> +/*
> + * check if the page is all 0
> + */
> +static inline bool is_zero_page(unsigned char *buf, long page_size)
> +{
> +    return buffer_is_zero(buf, page_size);
> +}

nice (except "page_size" should have type size_t, and *buf should be
const-qualified)

> +
> +static int write_dump_pages(DumpState *s)
> +{
> +    int ret = 0;
> +    DataCache page_desc, page_data;
> +    size_t len_buf_out, size_out;
> +#ifdef CONFIG_LZO
> +    lzo_bytep wrkmem = NULL;
> +#endif
> +    unsigned char *buf_out = NULL;
> +    off_t offset_desc, offset_data;
> +    PageDesc pd, pd_zero;
> +    uint64_t pfn_start, pfn_end, pfn;
> +    unsigned char buf[s->page_size];

Whoa, a VLA! :) I believe it's *very* non-idiomatic in the qemu source.
Please consider allocating it dynamically. (Of course others might point
out that I'm wrong.)

Even better, if readmem() could return a pointer into host memory
(rather than a deep copy), then you could drop "buf" completely (as an
array).

> +    MemoryMapping *memory_mapping;
> +    bool zero_page;
> +
> +    prepare_data_cache(&page_desc, s);
> +    prepare_data_cache(&page_data, s);
> +
> +    /* prepare buffer to store compressed data */
> +    len_buf_out = get_len_buf_out(s->page_size, s->flag_compress);
> +    if (len_buf_out == 0) {
> +        dump_error(s, "dump: failed to get length of output buffer.\n");
> +        goto out;
> +    }
> +
> +#ifdef CONFIG_LZO
> +    wrkmem = g_malloc(LZO1X_1_MEM_COMPRESS);
> +#endif

This could be further restricted with DUMP_DH_COMPRESSED_LZO.

> +
> +    buf_out = g_malloc(len_buf_out);
> +
> +    /* get offset of page_desc and page_data in dump file */
> +    offset_desc = s->offset_page;
> +    offset_data = offset_desc + sizeof(PageDesc) * s->num_dumpable;
> +    page_desc.offset = offset_desc;
> +    page_data.offset = offset_data;

I think that these offset initializations should happen inside
prepare_data_cache(). prepare_data_cache() should take the initial value
of "dc->offset" as a parameter. Sorry for not noticing it in the earlier
patch.

> +
> +    /*
> +     * init zero page's page_desc and page_data, because every zero page
> +     * uses the same page_data
> +     */
> +    pd_zero.size = s->page_size;
> +    pd_zero.flags = 0;
> +    pd_zero.offset = offset_data;
> +    pd_zero.page_flags = 0;

(cpu_to_le?)

> +    memset(buf, 0, pd_zero.size);
> +    ret = write_cache(&page_data, s->flag_flatten, buf, pd_zero.size);
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to write page data(zero page).\n");
> +        goto out;
> +    }

We're producing a zero page (to be referenced by zero, one, or more,
page descs). OK.

Also, I understand now how write_cache() + write_buffer() will operate.
For a regular (seekable) file, the caching saves a number of seeks, but
still the file is written through two parallel "offset streams". For a
flattened file, the cache size has a more visible consequence, namely
the number of MakedumpfileDataHeader's produced. OK.

pd_zero.size should be fine for write_cache(), but it's kind of hard to see:
- s->page_size is set to PAGE_SIZE in 10/11,
- data_cache->buf_size is set to BUFSIZE_DATA_CACHE (== PAGE_SIZE * 4)
in prepare_data_cache().

Hence my suggestion to add an assert to write_cache(), with a comment.

> +
> +    offset_data += pd_zero.size;
> +
> +    /* dump memory to vmcore page by page */
> +    QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
> +        pfn_start = paddr_to_pfn(memory_mapping->phys_addr, s->page_shift);
> +        pfn_end = paddr_to_pfn(memory_mapping->phys_addr +
> +                               memory_mapping->length, s->page_shift);

Again I think we're exploiting that both paging and filtering modes are
disabled for compressed dumps. (Same as in patch v6 07/11.) As I said
there, this seems to cause the memory mapping list to match the
guest_phys_block list identically. In that case though:

> +
> +        for (pfn = pfn_start; pfn < pfn_end; pfn++) {
> +            memset(buf, 0, s->page_size);
> +            ret = readmem(buf, pfn_to_paddr(pfn, s->page_shift), 
> s->page_size,
> +                          s);

This nested loop seems a bit wasteful -- we iterate over the pages in
the guest-phys blocks, and we look up each page again in the list of
guest-phys blocks. I think these two loops could be folded into one, but
I'm not suggesting it at v6 :) The current code does appear OK.

(Also, if this loop is changed, then all other similar loops should be
synchronized with it in the series, including the one in patch 7.)


> +            if (ret < 0) {
> +                dump_error(s, "dump: failed to read memory.\n");
> +                goto out;
> +            }

Given this error check, I think at least the memset() before readmem()
is unnecessary.

> +
> +            /* check zero page */
> +            zero_page = is_zero_page(buf, s->page_size);
> +            if (zero_page) {

(The bool is never reused, I think you could simply call is_zero_page()
inside the "if".)

> +                ret = write_cache(&page_desc, s->flag_flatten, &pd_zero,
> +                                  sizeof(PageDesc));
> +                if (ret < 0) {
> +                    dump_error(s, "dump: failed to write page desc.\n");
> +                    goto out;
> +                }
> +            } else {
> +                /*
> +                 * not zero page, then:
> +                 * 1. compress the page
> +                 * 2. write the compressed page into the cache of page_data
> +                 * 3. get page desc of the compressed page and write it into 
> the
> +                 *    cache of page_desc
> +                 */
> +                size_out = len_buf_out;
> +                if ((s->flag_compress & DUMP_DH_COMPRESSED_ZLIB) &&
> +                        (compress2(buf_out, &size_out, buf, s->page_size,
> +                        Z_BEST_SPEED) == Z_OK) && (size_out < s->page_size)) 
> {
> +                    pd.flags = DUMP_DH_COMPRESSED_ZLIB;
> +                    pd.size  = size_out;

(cpu_to_le?)

> +
> +                    ret = write_cache(&page_data, s->flag_flatten, buf_out,
> +                                      pd.size);
> +                    if (ret < 0) {
> +                        dump_error(s, "dump: failed to write page data.\n");
> +                        goto out;
> +                    }

OK I believe I understand the strategy. We have a priority list of
compression algorithms. For each page, the first one of:
- zlib
- lzo1x
- snappy
will be used that simultaneously:
- achieves actual compression,
- is enabled at build time,
- is selected at runtime.

Given that the runtime selection allows at most one compression
algorithm, this priority list is actually not a list, it's an exact choice.

And for that reason only, I won't say that "we have a small bug here,
namely compress2() can change 'size_out', and then we may call
lzo1x_1_compress() with the non-pristine size_out". :) Because if one of
the algorithms runs, we never try any other algorithm.

Both compression failure and compression without size reduction cause us
to fall back to saving the plaintext.

> +#ifdef CONFIG_LZO
> +                } else if ((s->flag_compress & DUMP_DH_COMPRESSED_LZO) &&
> +                        (lzo1x_1_compress(buf, s->page_size, buf_out,
> +                        &size_out, wrkmem) == LZO_E_OK) &&
> +                        (size_out < s->page_size)) {
> +                    pd.flags = DUMP_DH_COMPRESSED_LZO;
> +                    pd.size  = size_out;

(cpu_to_le?)

> +
> +                    ret = write_cache(&page_data, s->flag_flatten, buf_out,
> +                                      pd.size);
> +                    if (ret < 0) {
> +                        dump_error(s, "dump: failed to write page data.\n");
> +                        goto out;
> +                    }
> +#endif
> +#ifdef CONFIG_SNAPPY
> +                } else if ((s->flag_compress & DUMP_DH_COMPRESSED_SNAPPY) &&
> +                        (snappy_compress((char *)buf, s->page_size,
> +                        (char *)buf_out, (size_t *)&size_out) == SNAPPY_OK) 
> &&

"&size_out" already has type pointer-to-size_t, no need to cast it.

> +                        (size_out < s->page_size)) {
> +                    pd.flags = DUMP_DH_COMPRESSED_SNAPPY;
> +                    pd.size  = size_out;

(cpu_to_le?)

> +
> +                    ret = write_cache(&page_data, s->flag_flatten, buf_out,
> +                                      pd.size);
> +                    if (ret < 0) {
> +                        dump_error(s, "dump: failed to write page data.\n");
> +                        goto out;
> +                    }
> +#endif
> +                } else {
> +                    pd.flags = 0;
> +                    pd.size = s->page_size;

(cpu_to_le?)

> +
> +                    ret = write_cache(&page_data, s->flag_flatten, buf,
> +                                      pd.size);
> +                    if (ret < 0) {
> +                        dump_error(s, "dump: failed to write page data.\n");
> +                        goto out;
> +                    }
> +                }
> +
> +                /* get and write page desc here */
> +                pd.page_flags = 0;
> +                pd.offset = offset_data;

(cpu_to_le?)


> +                offset_data += pd.size;

I wonder if you could drop "offset_data" completely, and use
"page_data.offset" instead. Hm, no, you can't, because
"page_data.offset" is updated only when flushing the page data cache,
but you need an up-to-date "offset_data" value at each page. OK.

> +
> +                ret = write_cache(&page_desc, s->flag_flatten, &pd,
> +                                  sizeof(PageDesc));
> +                if (ret < 0) {
> +                    dump_error(s, "dump: failed to write page desc.\n");
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    ret = sync_data_cache(&page_desc, s->flag_flatten);
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to sync cache for page_desc.\n");
> +        goto out;
> +    }
> +    ret = sync_data_cache(&page_data, s->flag_flatten);
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to sync cache for page_data.\n");
> +        goto out;
> +    }
> +
> +out:
> +    free_data_cache(&page_desc);
> +    free_data_cache(&page_data);
> +
> +#ifdef CONFIG_LZO
> +    g_free(wrkmem);
> +#endif
> +
> +    g_free(buf_out);
> +
> +    return ret;
> +}
> +
>  static ram_addr_t get_start_block(DumpState *s)
>  {
>      GuestPhysBlock *block;
> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
> index ab44af8..382a3c3 100644
> --- a/include/sysemu/dump.h
> +++ b/include/sysemu/dump.h
> @@ -40,6 +40,8 @@
>  
>  #define paddr_to_pfn(X, page_shift) \
>      (((unsigned long long)(X) >> (page_shift)) - ARCH_PFN_OFFSET)
> +#define pfn_to_paddr(X, page_shift) \
> +    (((unsigned long long)(X) + ARCH_PFN_OFFSET) << (page_shift))
>  
>  typedef struct ArchDumpInfo {
>      int d_machine;  /* Architecture */
> @@ -149,6 +151,13 @@ typedef struct DataCache {
>      off_t offset;       /* offset of the file */
>  } DataCache;
>  
> +typedef struct PageDesc {
> +    uint64_t offset;                   /* the offset of the page data*/
> +    uint32_t size;                  /* the size of this dump page */
> +    uint32_t flags;                 /* flags */
> +    uint64_t page_flags;            /* page flags */
> +} PageDesc;

- some whitespace damage near the first comment
- QEMU_PACKED probably missing from the type definition (although it
would be likely cosmetic given this specific structure)

> +
>  struct GuestPhysBlockList; /* memory_mapping.h */
>  int cpu_get_dump_info(ArchDumpInfo *info,
>                        const struct GuestPhysBlockList *guest_phys_blocks);
> 

I'm withholding my R-b due to the ram_addr_t <-> hwaddr "problem" only.

The rest would be just nice to have (maybe in another series :), so) I
don't insist.

Nice work!

Thanks
Laszlo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]