Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

From:	Anthony Liguori
Subject:	Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date:	Tue, 07 Sep 2010 10:40:46 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100713 Lightning/1.0b1 Thunderbird/3.0.6

On 09/07/2010 09:51 AM, Avi Kivity wrote:

I'll let Stefan address most of this.

     uint32_t first_cluster;       /* in clusters */


First cluster of what?

This should probably be header_size /* in clusters */ because that'swhat it really means.


Need a checksum for the header.


Is that not a bit overkill for what we're doing?  What's the benefit?

The L2 link '''should''' be made after the data is in place onstorage. However, when no ordering is enforced the worst casescenario is an L2 link to an unwritten cluster.
Or it may cause corruption if the physical file size is not committed,and L2 now points at a free cluster.

An fsync() will make sure the physical file size is committed. Themetadata does not carry an additional integrity guarantees over theactual disk data except that in order to avoid internal corruption, wehave to order the L2 and L1 writes.

As part of the read process, it's important to validate that the L2entries don't point to blocks beyond EOF. This is an indication of acorrupted I/O operation and we need to treat that as an unallocated cluster.

The L1 link '''must''' be made after the L2 cluster is in place onstorage. If the order is reversed then the L1 table may point to abogus L2 table. (Is this a problem since clusters are allocated atthe end of the file?)
==Grow==
# If table_size * TABLE_NOFFSETS < new_image_size, fail -EOVERFLOW.The L1 table is not big enough.
With a variable-height tree, we allocate a new root, link its firstentry to the old root, and write the new header with updated root andheight.
# Write new image_size header field.

=Data integrity=
==Write==
Writes that complete before a flush must be stable when the flushcompletes.
If storage is interrupted (e.g. power outage) then writes in progressmay be lost, stable, or partially completed. The storage must not beotherwise corrupted or inaccessible after it is restarted.
We can remove this requirement by copying-on-write any metadata write,and keeping two copies of the header (with version numbers and checksums).

QED has a property today that all metadata or cluster locations have asingle location on the disk format that is immutable. Defrag wouldrelax this but defrag can be slow.

Having an immutable on-disk location is a powerful property whicheliminates a lot of complexity with respect to reference counting anddealing with free lists.

For the initial design I would avoid introducing something like this.One of the nice things about features is that we can introducemulti-level trees as a future feature if we really think it's the rightthing to do.

But we should start at a simple design with high confidence and highperformance, and then introduce features with the burden that we'reabsolutely sure that we don't regress integrity or performance.


Regards,

Anthony Liguori

Enterprise storage will not corrupt on writes, but commodity storagemay.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Re: [RFC] qed: Add QEMU Enhanced Disk format, (continued)
- Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Daniel P. Berrange, 2010/09/06
  - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Anthony Liguori, 2010/09/06
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Daniel P. Berrange, 2010/09/06
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Anthony Liguori, 2010/09/06
- Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Anthony Liguori, 2010/09/06
  - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Avi Kivity, 2010/09/07
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Anthony Liguori <=
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Avi Kivity, 2010/09/07
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Anthony Liguori, 2010/09/07
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Anthony Liguori, 2010/09/07
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Avi Kivity, 2010/09/08
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Alexander Graf, 2010/09/08
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Avi Kivity, 2010/09/08
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Stefan Hajnoczi, 2010/09/08
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Christoph Hellwig, 2010/09/08
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Anthony Liguori, 2010/09/08
    - Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format, Christoph Hellwig, 2010/09/08

Prev by Date: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Next by Date: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Previous by thread: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Next by thread: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Index(es):
- Date
- Thread