qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: Caching modes


From: Christoph Hellwig
Subject: [Qemu-devel] Re: Caching modes
Date: Tue, 21 Sep 2010 01:17:42 +0200
User-agent: Mutt/1.3.28i

On Mon, Sep 20, 2010 at 03:11:31PM -0500, Anthony Liguori wrote:
> >>All read and write requests SHOULD avoid any type of caching in the
> >>host.  Any write request MUST complete after the next level of storage
> >>reports that the write request has completed.  A flush from the guest
> >>MUST complete after all pending I/O requests for the guest have been
> >>completed.
> >>
> >>As an implementation detail, with the raw format, these guarantees are
> >>only in place for preallocated images.  Sparse images do not provide as
> >>strong of a guarantee.
> >>     
> >That's not how cache=none ever worked nor works currently.
> >   
> 
> How does it work today compared to what I wrote above?

For the guest point of view it works exactly as you describe
cache=writeback.  There is no ordering or cache flushing guarantees.  By
using O_DIRECT we do bypass the host file cache, but we don't even try
on the others (disk cache, commiting metadata transaction that are
required to actually see the commited data for sparse, preallocated or
growing images).

What you describe above is the equivalent of O_DSYNC|O_DIRECT which
doesn't exist in current qemu, except that O_DSYNC|O_DIRECT also
guarantees the semantics for sparse images.  Sparse images really aren't
special in any way - preallocaiton using posix_fallocate or COW
filesystems like btrfs,nilfs2 or zfs have exactly the same issues.

> >                       | WC enable | WC disable
> >-----------------------------------------------
> >direct                |           |
> >buffer                |           |
> >buffer + ignore flush |           |
> >
> >currently we only have:
> >
> >  cache=none         direct + WC enable
> >  cache=writeback    buffer + WC enable
> >  cache=writethrough buffer + WC disable
> >  cache=unsafe               buffer + ignore flush + WC enable
> >   
> 
> Where does O_DSYNC fit into this chart?

O_DSYNC is used for all WC disable modes.

> Do all modern filesystems implement O_DSYNC without generating 
> additional barriers per request?
> 
> Having a barrier per-write request is ultimately not the right semantic 
> for any of the modes.  However, without the use of O_DSYNC (or 
> sync_file_range(), which I know you dislike), I don't see how we can 
> have reasonable semantics without always implementing write back caching 
> in the host.

Barriers are a Linux-specific implementation details that is in the
process of going away, probably in Linux 2.6.37.  But if you want
O_DSYNC semantics with a volatile disk write cache there is no way
around using a cache flush or the FUA bit on all I/O caused by it.  We
currently use the cache flush, and although I plan to experiment a bit
more with the FUA bit for O_DIRECT | O_DSYNC writes I would be very
surprised if they actually are any faster.

> I'm certainly happy to break up the caching option.  However, I still 
> don't know how we get a reasonable equivalent to cache=writethrough 
> without assuming that ext4 is mounted without barriers enabled.

There's two problems here - one is a Linux-wide problem and that's the
barrier primitive which is currenly the only way to flush a volatile
disk cache.  We've sorted this out for the 2.6.37.  The other is that
ext3 and ext4 have really bad fsync implementations.  Just use a better
filesystem or bug one of it's developers if you want that fixed.  But
except for disabling the disk cache there is no way to get data integrity
without cache flushes (the FUA bit is nothing but an implicit flush).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]