qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QCOW2 deduplication design


From: Benoît Canet
Subject: Re: [Qemu-devel] QCOW2 deduplication design
Date: Wed, 9 Jan 2013 17:40:14 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

> 
> What is the GTree indexed by physical offset used for?

It's used for two things: deletion and loading of the hashes.

-Deletion is a hook in the refcount code that trigger when zero is reached.
 the only information the code got is the physical offset of the yet to discard
cluster. The hash must be cleared on disk so a lookup by offset is done.
Another way would be to read the deleted cluster, compute it's hash and use the
result to delete the hash from disk. It seems an heavy procedure.

-When the hash are loaded at startup another cluster written at the same
physical place can create another hash superceeding the first one.
The by offset tree is used in this case to keep the most recent hash for a given
cluster in memory.

> > when a write is unaligned or smaller than a 4KB cluster the deduplication 
> > code
> > issue one or two reads to get the missing data required to build a 4KB*n 
> > linear
> > buffer.
> > The deduplication metrics code show that this situation don't happen with 
> > virtio
> > and ext3 as a guest partition.
> 
> If the application uses O_DIRECT inside the guest you may see <4 KB
> requests even on ext3 guest file systems.  But in the buffered I/O
> case the file system will use 4 KB blocks or similar.

This means we can expect bad performances with some kind of loads.

> > The cluster is counted as duplicated and not rewriten on disk
> 
> This case is when identical data is rewritten in place?  No writes are
> required - this is the scenario where online dedup is faster than
> non-dedup because we avoid I/O entirely.

Yes but experiments shows that dedup is always faster. It goes exactly
at the storage native speed.

> > I.5) cluster removal
> > When a L2 entry to a cluster become stale the qcow2 code decrement the
> > refcount.
> > When the refcount reach zero the L2 hash block of the stale cluster
> > is written to clear the hash.
> > This happen often and require the second GTree to find the hash by it's 
> > physical
> > sector number
> 
> This happens often?  I'm surprised.  Thought this only happens when
> you delete snapshots or resize the image file?  Maybe I misunderstood
> this case.

Yes the preliminary metrics code shows that cluster removal happen often.
Maybe some recurent filesystem structure is written to disk first and
overwritten. (inode skeleton, or journal zeroing)

> > I.6) max refcount reached
> > The L2 hash block of the cluster is written in order to remember at next 
> > startup
> > that it must not be used anymore for deduplication. The hash is dropped 
> > from the
> > gtrees.
> 
> Interesting case.  This means you can no longer take snapshots
> containing this cluster because we cannot track references :(.
> 
> Worst case: guest fills the disk with the same 4 KB data (e.g.
> zeroes).  There is only a single data cluster but the refcount is
> maxed out.  Now it is not possible to take a snapshot.

Maybe I could just lower the dedup max refcount leaving room for snapshots.
It would need a way to differentiate the snapshot case in the hook code path.

Regards

Benoît



reply via email to

[Prev in Thread] Current Thread [Next in Thread]