qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification


From: Kevin Wolf
Subject: [Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification
Date: Fri, 12 Nov 2010 14:58:12 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10

Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
> Signed-off-by: Stefan Hajnoczi <address@hidden>
> ---
>  docs/specs/qed_spec.txt |  128 
> +++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 128 insertions(+), 0 deletions(-)
>  create mode 100644 docs/specs/qed_spec.txt
> 
> diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt
> new file mode 100644
> index 0000000..e4425c8
> --- /dev/null
> +++ b/docs/specs/qed_spec.txt
> @@ -0,0 +1,128 @@
> +=Specification=
> +
> +The file format looks like this:
> +
> + +----------+----------+----------+-----+
> + | cluster0 | cluster1 | cluster2 | ... |
> + +----------+----------+----------+-----+
> +
> +The first cluster begins with the '''header'''.  The header contains 
> information about where regular clusters start; this allows the header to be 
> extensible and store extra information about the image file.  A regular 
> cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''.  L1 
> and L2 tables are composed of one or more contiguous clusters.
> +
> +Normally the file size will be a multiple of the cluster size.  If the file 
> size is not a multiple, extra information after the last cluster may not be 
> preserved if data is written.  Legitimate extra information should use space 
> between the header and the first regular cluster.
> +
> +All fields are little-endian.
> +
> +==Header==
> + Header {
> +     uint32_t magic;               /* QED\0 */
> + 
> +     uint32_t cluster_size;        /* in bytes */
> +     uint32_t table_size;          /* for L1 and L2 tables, in clusters */
> +     uint32_t header_size;         /* in clusters */
> + 
> +     uint64_t features;            /* format feature bits */
> +     uint64_t compat_features;     /* compat feature bits */
> +     uint64_t l1_table_offset;     /* in bytes */
> +     uint64_t image_size;          /* total logical image size, in bytes */
> + 
> +     /* if (features & QED_F_BACKING_FILE) */
> +     uint32_t backing_filename_offset; /* in bytes from start of header */
> +     uint32_t backing_filename_size;   /* in bytes */
> + }
> +
> +Field descriptions:
> +* ''cluster_size'' must be a power of 2 in range [2^12, 2^26].
> +* ''table_size'' must be a power of 2 in range [1, 16].
> +* ''header_size'' is the number of clusters used by the header and any 
> additional information stored before regular clusters.
> +* ''features'', ''compat_features'', and ''autoclear_features'' are file 
> format extension bitmaps.  They work as follows:
> +** An image with unknown ''features'' bits enabled must not be opened.  File 
> format changes that are not backwards-compatible must use ''features'' bits.
> +** An image with unknown ''compat_features'' bits enabled can be opened 
> safely.  The unknown features are simply ignored and represent 
> backwards-compatible changes to the file format.
> +** An image with unknown ''autoclear_features'' bits enable can be opened 
> safely after clearing the unknown bits.  This allows for backwards-compatible 
> changes to the file format which degrade gracefully and can be re-enabled 
> again by a new program later.

autoclear features aren't even part of the header in the spec.

> +* ''l1_table_offset'' is the offset of the first byte of the L1 table in the 
> image file and must be a multiple of ''cluster_size''.
> +* ''image_size'' is the block device size seen by the guest and must be a 
> multiple of 512 bytes.
> +* ''backing_filename'' is a string in (byte offset, byte size) form.  It is 
> not NUL-terminated and has no alignment constraints.
> +
> +Feature bits:
> +* QED_F_BACKING_FILE = 0x01.  The image uses a backing file.  The backing 
> filename string is given in the ''backing_filename_{offset,size}'' fields and 
> may be an absolute path or relative to the image file.
> +* QED_F_NEED_CHECK = 0x02.  The image needs a consistency check before use.
> +* QED_F_BACKING_FORMAT_NO_PROBE = 0x04.  The backing file is a raw disk 
> image and no file format autodetection should be attempted.  This should be 
> used to ensure that raw backing images are never detected as an image format 
> if they happen to contain magic constants.
> +
> +There are currently no defined ''compat_features'' or ''autoclear_features'' 
> bits.
> +
> +Fields predicated on a feature bit are only used when that feature is set.  
> The fields always take up header space, regardless of whether or not the 
> feature bit is set.
> +
> +==Tables==
> +
> +Tables provide the translation from logical offsets in the block device to 
> cluster offsets in the file.
> +
> + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
> +  
> + Table {
> +     uint64_t offsets[TABLE_NOFFSETS];
> + }
> +
> +The tables are organized as follows:
> +
> +                    +----------+
> +                    | L1 table |
> +                    +----------+
> +               ,------'  |  '------.
> +          +----------+   |    +----------+
> +          | L2 table |  ...   | L2 table |
> +          +----------+        +----------+
> +      ,------'  |  '------.
> + +----------+   |    +----------+
> + |   Data   |  ...   |   Data   |
> + +----------+        +----------+
> +
> +A table is made up of one or more contiguous clusters.  The table_size 
> header field determines table size for an image file.  For example, 
> cluster_size=64 KB and table_size=4 results in 256 KB tables.
> +
> +The logical image size must be less than or equal to the maximum possible 
> size of clusters rooted by the L1 table:
> + header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
> +
> +All offsets in L1 and L2 tables are cluster-aligned.  The least significant 
> bits up to ''cluster_size'' are reserved and must be zero.

I know what you mean here, but the text leaves things a bit unclear.
First I would expect a bit number instead of a byte offset for "bits up
to x". Second, cluster_size is the first bit not reserved, whereas your
description sounds to me as if it included cluster_size.

>  This may be used in future format extensions to store per-offset information.
> +
> +The following offsets have special meanings:
> +
> +===L2 table offsets===
> +* 0 - unallocated.  The L2 table is not yet allocated.
> +
> +===Data cluster offsets===
> +* 0 - unallocated.  The data cluster is not yet allocated.
> +
> +===Unallocated L2 tables and data clusters===
> +Reads to an unallocated area of the image file access the backing file.  If 
> there is no backing file, then zeroes are produced.  The backing file may be 
> smaller than the image file and reads of unallocated areas beyond the end of 
> the backing file produce zeroes.
> +
> +Writes to an unallocated area cause a new data clusters to be allocated, and 
> a new L2 table if that is also unallocated.  The new data cluster is 
> populated with data from the backing image (or zeroes if no backing image) 
> and the data being written.
> +
> +===Logical offset translation===
> +Logical offsets are translated into cluster offsets as follows:
> +
> +  table_bits table_bits    cluster_bits
> +  <--------> <--------> <--------------->
> + +----------+----------+-----------------+
> + | L1 index | L2 index |     byte offset |
> + +----------+----------+-----------------+
> + 
> +       Structure of a logical offset
> +
> + offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
> + 
> + def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
> +   l2_offset = l1_table[l1_index]
> +   l2_table = load_table(l2_offset)
> +   cluster_offset = l2_table[l2_index] & offset_mask
> +   return cluster_offset + byte_offset
> +
> +==Consistency checking==
> +
> +This section is informational and included to provide background on the use 
> of the QED_F_NEED_CHECK ''features'' bit.
> +
> +The QED_F_NEED_CHECK bit is used to mark an image as dirty before starting 
> an operation that could leave the image in an inconsistent state if 
> interrupted by a crash or power failure.  A dirty image must be checked on 
> open because its metadata may not be consistent.
> +
> +Consistency check includes the following invariants:
> +# Each cluster is referenced once and only once.  It is an inconsistency to 
> have a cluster referenced more than once by L1 or L2 tables.  A cluster has 
> been leaked if it has no references.
> +# Offsets must be within the image file size and must be ''cluster_size'' 
> aligned.
> +# Table offsets must at least ''table_size'' * ''cluster_size'' bytes from 
> the end of the image file so that there is space for the entire table.
> +
> +The consistency check process starts by from ''l1_table_offset'' and scans 
> all L2 tables.  After the check completes with no other errors besides leaks, 
> the QED_F_NEED_CHECK bit can be cleared and the image can be accessed.

Looks okay otherwise.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]