[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH
From: |
Rusty Russell |
Subject: |
Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH |
Date: |
Wed, 5 May 2010 14:28:41 +0930 |
User-agent: |
KMail/1.13.2 (Linux/2.6.32-21-generic; KDE/4.4.2; i686; ; ) |
On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote:
> Jens Axboe wrote:
> > On Tue, May 04 2010, Rusty Russell wrote:
> > > ISTR someone mentioning a desire for such an API years ago, so CC'ing the
> > > usual I/O suspects...
> >
> > It would be nice to have a more fuller API for this, but the reality is
> > that only the flush approach is really workable. Even just strict
> > ordering of requests could only be supported on SCSI, and even there the
> > kernel still lacks proper guarantees on error handling to prevent
> > reordering there.
>
> There's a few I/O scheduling differences that might be useful:
>
> 1. The I/O scheduler could freely move WRITEs before a FLUSH but not
> before a BARRIER. That might be useful for time-critical WRITEs,
> and those issued by high I/O priority.
This is only because noone actually wants flushes or barriers, though
I/O people seem to only offer that. We really want "<these writes> must
occur before <this write>". That offers maximum choice to the I/O subsystem
and potentially to smart (virtual?) disks.
> 2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
> only for data belonging to a particular file (e.g. fdatasync with
> no file size change, even on btrfs if O_DIRECT was used for the
> writes being committed). That would entail tagging FLUSHes and
> WRITEs with a fs-specific identifier (such as inode number), opaque
> to the scheduler which only checks equality.
This is closer. In userspace I'd be happy with a "all prior writes to this
struct file before all future writes". Even if the original guarantees were
stronger (ie. inode basis). We currently implement transactions using 4 fsync
/msync pairs.
write_recovery_data(fd);
fsync(fd);
msync(mmap);
write_recovery_header(fd);
fsync(fd);
msync(mmap);
overwrite_with_new_data(fd);
fsync(fd);
msync(mmap);
remove_recovery_header(fd);
fsync(fd);
msync(mmap);
Yet we really only need ordering, not guarantees about it actually hitting
disk before returning.
> In other words, FLUSH can be more relaxed than BARRIER inside the
> kernel. It's ironic that we think of fsync as stronger than
> fbarrier outside the kernel :-)
It's an implementation detail; barrier has less flexibility because it has
less information about what is required. I'm saying I want to give you as
much information as I can, even if you don't use it yet.
Thanks,
Rusty.
[Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH, Christoph Hellwig, 2010/05/04
Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH, Jamie Lokier, 2010/05/04