qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] ide.c make write cacheing controllable by guest


From: Jamie Lokier
Subject: Re: [Qemu-devel] [PATCH] ide.c make write cacheing controllable by guest
Date: Tue, 26 Feb 2008 07:32:57 +0000
User-agent: Mutt/1.5.13 (2006-08-11)

[To qemu-devel and Chris, I have started a thread on linux-kernel on
this topic.  I've copied the first few paragraphs here, so you can see
what it's about since it's a response to a post here.  But it's
largely off topic for Qemu, and on topic for linux-kernel, so I didn't
cross post lest linux-kernel replies come here.]


To: address@hidden, address@hidden
Subject: Proposal for "proper" durable fsync() and fdatasync()
Message-ID: <address@hidden>
Date: Tue, 26 Feb 2008 07:26:49 +0000


Dear kernel,

This is a proposal to add "proper" durable fsync() and fdatasync() to Linux.

First the problem, then a proposed solution "with benefits", so to speak.

[...]

By durable, I mean that fsync() should actually commit writes to
physical stable storage, not just the disk write cache when that is
enabled.  Databases and guest VMs needs this, or an equivalent
feature, if they aren't to face occasional corruption after power
failure and perhaps some crashes.

The alternative is to disable the disk write cache.  But that isn't
modern practice or recommendation, since I/O write barriers were
implemented and they are much faster.

I was surprised that fsync() doesn't do this already.  There was a lot
of effort put into block I/O write barriers during 2.5, so that
journalling filesystems can force correct write ordering, using disk
flush cache commands.

After all that effort, I was very surprised to notice that Linux 2.6.x
doesn't use that capability to ensure fsync() flushes the disk cache
onto stable storage.

I noticed this following up discussions on the Qemu mailing list,
about guest VMs and how their IDE flush cache command should translate
to fsync() to avoid data loss.  (For guest VMs, fsync() isn't
necessary if the host machine is fine, and it isn't enough (on Linux
host) if the host machine loses power or the hard disk crashes another
way.)

Then I noticed it again, when I was designing a database engine with
filesystem characteristics.  I thought "how do I ensure ordered
journal writes; can I use fdatasync()?" and was surprised to find the
answer is no, I have to use hacks like calling hdparm, and the authors
of major SQL databases seem to brush the problem under a carpet.

(Interestingly, in the Linux 2.4 patches for write barriers, fsync()
seems to be fine, if a bit slow.)

It isn't the first time this topic has come up:

    
http://groups.google.com.br/group/linux.kernel/browse_thread/thread/d343e51655b4ac7c/7ee9bca80977c2d1?#7ee9bca80977c2d1
    ("True fsync() in Linux (on IDE)")

In that thread, it was implied that would be fixed in 2.6.  So I bet
some people are under the illusion that it's fixed in 2.6...


For a while, I've been meaning to bring it up on linux-kernel...


[More on linux-kernel].

Thanks,
-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]