[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC V6 00/33] QCOW2 deduplication core functionality
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [RFC V6 00/33] QCOW2 deduplication core functionality |
Date: |
Fri, 8 Feb 2013 14:49:43 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Wed, Feb 06, 2013 at 01:31:33PM +0100, Benoît Canet wrote:
> This patchset create the core infrastructure for deduplication and enable it.
Here is a high-level overview of qcow2 dedup for others reviewing this series:
Data structures
---------------
Deduplication adds one new on-disk structure: the dedup hash table. Each entry
contains:
* Hash value of a data cluster (32 bytes)
* Logical offset of the first cluster with these contents (8 bytes)
Unallocated clusters have no hash value so the dedup hash table uses an L1
table to allow sparse allocation of dedup hash table entries. Dedup hash table
blocks are accessed through dedup_cluster_cache, a cache of the L2 blocks.
The dedup hash table is indexed by physical offset, in other words it can be
used to look up the hash for a given offset into the image file. This means it
cannot be used to find duplicate clusters.
This is solved by building an in-memory lookup tree when the file is opened.
The lookup tree maps hashes to the physical offset and first logical offset -
it is an inverse mapping.
Since the dedup hash table is already scanned on startup, the forward mapping
(physical offset to hash) is also loaded into an in-memory lookup tree.
Finally we have arrived at the two trees used for deduplication:
1. dedup_tree_by_hash: hash -> <physical offset, first logical offset>
2. dedup_tree_by_sect: physical offset -> <hash, first logical offset>
dedup_tree_by_sect is the in-memory version of the dedup hash table on disk.
Read operation
--------------
Reads are unaffected by dedup.
Write operation
---------------
Writes must be cluster-aligned/cluster-sized or else they incur extra reads to
load the untouched sectors (similar to copy-on-write). This is necessary
because the hash is calculated for the entire cluster so we always work in
cluster units.
When a duplicate is found the refcount of the existing cluster is incremented.
Since refcounts are 16-bit there is a risk of overflowing, for example with an
all-zero disk image. A threshold is set for breaking deduplication and
creating a new cluster for the sake of preventing refcount overflow.
Stefan
- [Qemu-devel] [RFC V6 23/33] qcow2: Add qcow2_dedup_is_running to probe if dedup is running., (continued)
- [Qemu-devel] [RFC V6 23/33] qcow2: Add qcow2_dedup_is_running to probe if dedup is running., Benoît Canet, 2013/02/06
- [Qemu-devel] [RFC V6 24/33] qcow2: Integrate deduplication in qcow2_co_writev loop., Benoît Canet, 2013/02/06
- [Qemu-devel] [RFC V6 14/33] qcow2: Create qcow2_is_cluster_to_dedup., Benoît Canet, 2013/02/06
- [Qemu-devel] [RFC V6 28/33] qcow2: Add check_dedup_l2 in order to check l2 of dedup table., Benoît Canet, 2013/02/06
- [Qemu-devel] [RFC V6 09/33] qcow2: Implement qcow2_compute_cluster_hash., Benoît Canet, 2013/02/06
- [Qemu-devel] [RFC V6 10/33] qcow2: Extract qcow2_dedup_grow_table, Benoît Canet, 2013/02/06
- Re: [Qemu-devel] [RFC V6 00/33] QCOW2 deduplication core functionality,
Stefan Hajnoczi <=