[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_syn
From: |
Anthony Liguori |
Subject: |
[Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes" |
Date: |
Tue, 24 Aug 2010 08:56:29 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100713 Lightning/1.0b1 Thunderbird/3.0.6 |
On 08/24/2010 08:44 AM, Avi Kivity wrote:
On 08/24/2010 04:40 PM, Anthony Liguori wrote:
1. Allocate a cluster (increase refcount table)
2. Link cluster to L2 table
3. Second operation makes it to disk; first still in pagecache
4. Crash
5. Dangling pointer from L2 to freed cluster
Yes, having this discussion in IRC.
The problem is that we maintain a refcount table.
Are you sure that's the only issue?
No.
If we didn't do internal disk snapshots, we wouldn't have this
problem. IOW, VMDK doesn't have this problem so the answer to my
very first question is that qcow2 is too difficult a format to get
right.
One doesn't follow from the other (though I'm no fan of internal
snapshots, myself).
It does. Let's consider the failure scenarios:
1) guest submits write request
2) allocate extent
3) write data to disk (a)
4) write (a) completes
5) update reference count table for new extent (b)
6) write (b) completes
7) write extent table (c)
8) write (c) completes
9) complete guest write request
If this all happened in order and we lost power, the worst case error is
that we leak a block which isn't terrible.
But we're not guaranteed that this happens in order.
If (b) or (c) happen before (a), then the image is not corrupted but
data gets lost. That's okay because it's part of the guest contract.
If (c) happens before (b), then we've created an extent that's attached
to a table with a zero reference count. This is a corrupt image.
Let's consider if we eliminate the reference count table which means
eliminating internal snapshots.
1) guest submits write request
2) allocate extent
3) write data to disk (a)
4) write (a) completes
5) write extent table (c)
6) write (c) completes
7) complete guest write request
If this all happens in order and we lose power, we just leak a block.
It means we need a periodic fsck.
If (c) completes before (a), then it means that the image is not
corrupted but data gets lost. This is okay based on the guest contract.
And that's it. There is no scenario where the disk is corrupted.
So in summary, both situations are not perfect, but scenario (1) can
result in a corrupted image whereas scenario (2) results in leakage.
The classic solution to this is fsck.
Regards,
Anthony Liguori
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", (continued)
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Juan Quintela, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Kevin Wolf, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Avi Kivity, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Avi Kivity, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Avi Kivity, 2010/08/24
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes",
Anthony Liguori <=
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Avi Kivity, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Avi Kivity, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Avi Kivity, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Avi Kivity, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Avi Kivity, 2010/08/25
- [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes", Anthony Liguori, 2010/08/25