qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] block-queue: Delay and batch metadata writes


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] block-queue: Delay and batch metadata writes
Date: Mon, 20 Sep 2010 11:34:02 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100826 Lightning/1.0b1 Thunderbird/3.0.7

On 09/20/2010 10:55 AM, Kevin Wolf wrote:
Am 20.09.2010 17:40, schrieb Anthony Liguori:
On 09/20/2010 10:08 AM, Kevin Wolf wrote:
If you're comfortable with a writeback cache for metadata, then you
should also be comfortable with a writeback cache for data in which
case, cache=writeback is the answer.

Well, there is a difference: We don't pollute the host page cache with
guest data and we don't get a virtual "disk cache" as big as the host
RAM, but only a very limited queue of metadata.

Basically, in qemu we have three different types of caching:

1. O_DSYNC, everything is always synced without any explicit request.
     This is cache=writethrough.

I actually think O_DSYNC is the wrong implementation of
cache=writethrough.  cache=writethrough should behave just like
cache=none except that data goes through the page cache.
Then you have cache=writeback, basically.

No. Write through means "write requests are not completed until the data has been acknowledged by the next layer." Write back means "write requests are completed irrespective of the data being acknowledged by the next layer."

Write through ensures consistency with the next layer whereas write back doesn't.

cache=none means that there is no cache. If there is no cache, then you're guaranteed to be consistent with the next layer.

The only reason it's exposed as writeback at the emulation layer is that *usually* disks have writeback caches. The fact that writethrough currently is stronger than the next layer (in terms that it breaks through the writeback cache a disk may have) is not a designed feature. It's an accident.

Had ext3 enabled barriers by default, writethrough would not use O_DSYNC.

2. Nothing is ever synced. This is cache=unsafe.

3. We present a writeback disk cache to the guest and the guest needs
     to explicitly flush to gets its data safe on disk. This is
     cache=writeback and cache=none.

We shouldn't tie the virtual disk cache to which cache= option is used
in the host.  cache=none means that all requests go directly to the
disk.  cache=writeback means the host acts as a writeback cache.
No, that's not the meaning of cache=none if you take the disk cache into
consideration.

You can't possibly take into account the disk cache because we don't know anything about the disk cache in QEMU. Just assuming disks always have writeback caches is wrong.

  It might be what you think should be the meaning of
cache=none, but it's not what it means in any qemu release.

That's precisely what it's meant in every release since the cache options when I first introduced them.

We're still lacking modes for O_DSYNC | O_DIRECT and unsafe | O_DIRECT,
but they are entirely possible, because it's two different dimensions.
(And I think Christoph was planning to actually make it two independent
options)
I don't really think O_DSYNC | O_DIRECT makes much sense.
Maybe, maybe not. It's just a missing entry in the matrix.

Regards,

Anthony Liguori



reply via email to

[Prev in Thread] Current Thread [Next in Thread]