Re: [Qemu-devel] [RFC] Disk integrity in QEMU

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Disk integrity in QEMU

From:	Avi Kivity
Subject:	Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Date:	Sun, 12 Oct 2008 12:12:06 +0200
User-agent:	Thunderbird 2.0.0.16 (X11/20080723)

Anthony Liguori wrote:

If I focus on the sentence "The I/O is synchronous, that is, at
the completion of a read(2) or write(2), data is guaranteed to have
been transferred. ",
It's extremely important to understand what the guarantee is. Theguarantee is that upon completion on write(), the data will have beenreported as written by the underlying storage subsystem. This does*not* mean that the data is on disk.

It means that as far as the block-io layer of the kernel is concerned,the guarantee is met. If the writes go to to a ramdisk, or to an IDEdrive with write-back cache enabled, or to disk with write-back cachedisabled but without redundancy, or to a high-end storage array withdouble-parity protection but without a continuous data protectionoffsite solution, things may still go wrong.

It is up to qemu to provide a strong link in the data reliability chain,not to ensure that the entire chain is perfect. That's up to theadministrator or builder of the system.

If you have a normal laptop, your disk has a cache. That cache doesnot have a battery backup. Under normal operations, the cache isacting in write-back mode and when you do a write, the disk willreport the write as completed even though it is not actually on disk.If you really care about the data being on disk, you have to eitheruse a disk with a battery backed cache (much more expensive) or enablewrite-through caching (will significantly reduce performance).

I think that with SATA NCQ, this is no longer true. The drive willreport the write complete when it is on disk, and utilize multipleoutstanding requests to get coalescing and reordering. Not sure aboutthis, though -- some drives may still be lying.

In the case of KVM, even using write-back caching with the host pagecache, we are still honoring the guarantee of O_DIRECT. We just haveanother level of caching that happens to be write-back.

No, we are lying. That's fine if the user tells us to lie, but nototherwise.

I think there a bug here. If I open a
file with the O_DIRECT flag and the host reports back to me that
the transfer has completed when in fact its still in the host cache,
its a bug as it violates the open()/write() call and there is no
guarantee that the data will actually be written.
This is very important, O_DIRECT does *not* guarantee that dataactually resides on disk. There are many possibly places that it canbe cached (in the storage controller, in the disks themselves, in aRAID controller).

O_DIRECT guarantees that the kernel is not the weak link in thereliability chain.

So I guess the real issue isn't what the default should be (although
the performance team at Red Hat would vote for cache=off),
The consensus so far has been that we want to still use the host pagecache but use it in write-through mode. This would mean that theguest would only see data completion when the host's storage subsystemreports the write as having completed. This is not the same ascache=off but I think gives the real effect that is desired.

I am fine with write-through as default, but cache=off should be asupported option.


Do you have another argument for using cache=off?


Performance.

--
error compiling committee.c: too many arguments to function

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC] Disk integrity in QEMU, (continued)

Prev by Date: Re: [Qemu-devel] Fabrice Bellard 's QEMU configuration files
Next by Date: [Qemu-devel] Re: [5465] hw/apic.c: use __builtin funtions instead of assembly code
Previous by thread: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Next by thread: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Index(es):
- Date
- Thread