qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] ide.c make write cacheing controllable by guest


From: Jamie Lokier
Subject: Re: [Qemu-devel] [PATCH] ide.c make write cacheing controllable by guest
Date: Mon, 25 Feb 2008 20:50:40 +0000
User-agent: Mutt/1.5.13 (2006-08-11)

Ian Jackson wrote:
Content-Description: message body text
> The attached patch implements the ATA write cache feature.  This
> enables a guest to control, in the standard way, whether disk writes
> are immediately committed to disk before the IDE command completes, or
> may be buffered in the host.
> 
> In this patch, by default buffering is off, which provides better
> reliability but may have a performance impact.  It would be
> straightforward to change the default, or perhaps offer a command-line
> option, if that would be preferred.
> 
> This patch is derived from one which was originally submitted to the
> Xen tree by Rik van Riel <address@hidden>.

This is a very sensible improvement, imho.

However, I notice that it tells the guest that data is committed to
hard storage when the host has merely called fsync().

On Linux (and other host OSes), fdatsync() and fsync() don't always
commit data to hard storage; it sometimes only commits it to the hard
drive cache.  (Seriously, just look at fs/ext3/fsync.c; only journal
writes cause the flush, and they aren't done if the inode itself
hasn't changed).

It may be worth mentioning in documentation that guests which need
strong durability guarantees, i.e. for critical database work or for
filesystem journalling safety following host power failure, it is not
enough to disable the IDE write cache in the guest even with this
patch.  It is necessary to disable the host's disk write cache too,
for that.

Ideally, the host would provide variation of fdatasync() which flushes
data to hard storage in the same way that kernel filesystem journal
writes can do, and Qemu would use that.

But, presently, I'm not aware of any way to do that short of the
administrator disabling the host's disk write cache.

(Darwin provides F_FULLSYNC.  On Linux, an extra flag to
sync_file_range() suggests itself.  It would need changes to the block
device and elevator APIs, though, as it's a flush command not an
ordering tag, and not always associated with a prior or subsequent
write although there are some coalescing optimisations when it can be.)

-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]