[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: Caching modes
From: |
Christoph Hellwig |
Subject: |
[Qemu-devel] Re: Caching modes |
Date: |
Tue, 21 Sep 2010 16:26:08 +0200 |
User-agent: |
Mutt/1.3.28i |
On Mon, Sep 20, 2010 at 07:18:14PM -0500, Anthony Liguori wrote:
> O_DIRECT alone to a pre-allocated file on a normal file system should
> result in the data being visible without any additional metadata
> transactions.
Anthony, for the third time: no. O_DIRECT is a non-portable extension
in Linux (taken from IRIX) and is defined as:
O_DIRECT (Since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from this file.
In general this will degrade performance, but it is useful in
special situations, such as when applications do their own
caching. File I/O is done directly to/from user space buffers.
The O_DIRECT flag on its own makes at an effort to transfer data
synchronously, but does not give the guarantees of the O_SYNC
that data and necessary metadata are transferred. To guarantee
synchronous I/O the O_SYNC must be used in addition to O_DIRECT.
See NOTES below for further discussion.
A semantically similar (but deprecated) interface for block
devices is described in raw(8).
O_DIRECT does not have any meaning for data integrity, it just tells the
filesystem it *should* not use the pagecache. Even if it should not
various filesystem have fallbacks to buffered I/O for corner cases.
It does *not* mean the actual disk cache gets flushed, and it *does*
not guarantee anything about metadata which is very important.
Metadata updates happen when filling sparse file, when extening the file
size, when using a COW filesystem, and when converting preallocated to
fully allocated extents in practice and could happen in many more cases
depending on the filesystem implementation.
> >Barriers are a Linux-specific implementation details that is in the
> >process of going away, probably in Linux 2.6.37. But if you want
> >O_DSYNC semantics with a volatile disk write cache there is no way
> >around using a cache flush or the FUA bit on all I/O caused by it.
>
> If you have a volatile disk write cache, then we don't need O_DSYNC
> semantics.
If you present a volatile write cache to the guest you do indeed not
need O_DSYNC and can rely on the guest sending fdatasync calls when it
wants to flush the cache. But for the statement above you can replace
O_DSYC with fdatasync and it will still be correct. O_DSYNC in current
Linux kernels is nothing but an implicit range fdatasync after each
write.
> > We
> >currently use the cache flush, and although I plan to experiment a bit
> >more with the FUA bit for O_DIRECT | O_DSYNC writes I would be very
> >surprised if they actually are any faster.
> >
>
> The thing I struggle with understanding is that if the guest is sending
> us a write request, why are we sending the underlying disk a write +
> flush request? That doesn't seem logical at all to me.
We only send a cache flush request *iff* we present the guest a device
without a volatile write cache so that it can assume all writes are
stable and we sit on a device that does have a volatile write cache.
> Even if we advertise WC disable, it should be up to the guest to decide
> when to issue flushes.
No. If we don't claim to have a volatile cache no guest will ever flush
the cache. Which is just logially given that we just told it that we
don't have a cache that needs flushing.
> >ext3 and ext4 have really bad fsync implementations. Just use a better
> >filesystem or bug one of it's developers if you want that fixed. But
> >except for disabling the disk cache there is no way to get data integrity
> >without cache flushes (the FUA bit is nothing but an implicit flush).
> >
>
> But why are we issuing more flushes than the guest is issuing if we
> don't have to worry about filesystem metadata (i.e. preallocated storage
> or physical devices)?
Who is "we" and what is workload/filesystem/kernel combination?
Specific details and numbers please.
- [Qemu-devel] Caching modes, Anthony Liguori, 2010/09/20
- Re: [Qemu-devel] Caching modes, Blue Swirl, 2010/09/20
- [Qemu-devel] Re: Caching modes, Christoph Hellwig, 2010/09/20
- [Qemu-devel] Re: Caching modes, Anthony Liguori, 2010/09/20
- [Qemu-devel] Re: Caching modes, Christoph Hellwig, 2010/09/20
- [Qemu-devel] Re: Caching modes, Anthony Liguori, 2010/09/20
- [Qemu-devel] Re: Caching modes, Kevin Wolf, 2010/09/21
- [Qemu-devel] Re: Caching modes,
Christoph Hellwig <=
- [Qemu-devel] Re: Caching modes, Anthony Liguori, 2010/09/21
- [Qemu-devel] Re: Caching modes, Christoph Hellwig, 2010/09/21
- [Qemu-devel] Re: Caching modes, Anthony Liguori, 2010/09/21