qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date: Tue, 14 Sep 2010 11:46:17 +0100

On Fri, Sep 10, 2010 at 10:22 PM, Jamie Lokier <address@hidden> wrote:
> Stefan Hajnoczi wrote:
>> Since there is no ordering imposed between the data write and metadata
>> update, the following scenarios may occur on crash:
>> 1. Neither data write nor metadata update reach the disk.  This is
>> fine, qed metadata has not been corrupted.
>> 2. Data reaches disk but metadata update does not.  We have leaked a
>> cluster but not corrupted metadata.  Leaked clusters can be detected
>> with qemu-img check.
>> 3. Metadata update reaches disk but data does not.  The interesting
>> case!  The L2 table now points to a cluster which is beyond the last
>> cluster in the image file.  Remember that file size is rounded down by
>> cluster size, so partial data writes are discarded and this case
>> applies.
>
> Better add:
>
> 4. File size is extended fully, but the data didn't all reach the disk.

This case is okay.

If a data cluster does not reach the disk but the file size is
increased there are two outcomes:
1. A leaked cluster if the L2 table update did not reach the disk.
2. A cluster with junk data, which is fine since the guest has no
promise the data safely landed on disk without a completing a flush.

A flush is performed after allocating new L2 tables and before linking
them into the L1 table.  Therefore clusters can be leaked but an
invalid L2 table can never be linked into the L1 table.

> 5. Metadata is partially updated.
> 6. (Nasty) Metadata partial write has clobbered neighbouring
>   metadata which wasn't meant to be changed.  (This may happen up
>   to a sector size on normal hard disks - data is hard to come by.
>   This happens to a much larger file range on flash and RAIDs
>   sometimes - I call it the "radius of destruction").
>
> 6 can also happen when doing the L1 updated mentioned earlier, in
> which case you might lose a much larger part of the guest image.

These two cases are problematic.  I've been thinking in atomic sector
updates and not in a model where updates can be partial or even
destructive at the byte level.  Do you have references where I can
read more about the radius of destruction ;)?

Transactional I/O solves this problem.  Checksums can detect but do
not fix the problem alone.  Duplicate metadata together with checksums
could be a solution but I haven't thought through the details.

Any other suggestions?

Time to peek at md and dm to see how they safeguard metadata.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]