qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC V6 00/33] QCOW2 deduplication core functionality


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC V6 00/33] QCOW2 deduplication core functionality
Date: Fri, 8 Feb 2013 14:49:43 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Feb 06, 2013 at 01:31:33PM +0100, Benoît Canet wrote:
> This patchset create the core infrastructure for deduplication and enable it.

Here is a high-level overview of qcow2 dedup for others reviewing this series:

Data structures
---------------

Deduplication adds one new on-disk structure: the dedup hash table.  Each entry
contains:
 * Hash value of a data cluster (32 bytes)
 * Logical offset of the first cluster with these contents (8 bytes)

Unallocated clusters have no hash value so the dedup hash table uses an L1
table to allow sparse allocation of dedup hash table entries.  Dedup hash table
blocks are accessed through dedup_cluster_cache, a cache of the L2 blocks.

The dedup hash table is indexed by physical offset, in other words it can be
used to look up the hash for a given offset into the image file.  This means it
cannot be used to find duplicate clusters.

This is solved by building an in-memory lookup tree when the file is opened.
The lookup tree maps hashes to the physical offset and first logical offset -
it is an inverse mapping.

Since the dedup hash table is already scanned on startup, the forward mapping
(physical offset to hash) is also loaded into an in-memory lookup tree.

Finally we have arrived at the two trees used for deduplication:
1. dedup_tree_by_hash: hash -> <physical offset, first logical offset>
2. dedup_tree_by_sect: physical offset -> <hash, first logical offset>

dedup_tree_by_sect is the in-memory version of the dedup hash table on disk.

Read operation
--------------

Reads are unaffected by dedup.

Write operation
---------------

Writes must be cluster-aligned/cluster-sized or else they incur extra reads to
load the untouched sectors (similar to copy-on-write).  This is necessary
because the hash is calculated for the entire cluster so we always work in
cluster units.

When a duplicate is found the refcount of the existing cluster is incremented.
Since refcounts are 16-bit there is a risk of overflowing, for example with an
all-zero disk image.  A threshold is set for breaking deduplication and
creating a new cluster for the sake of preventing refcount overflow.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]