qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] bdrv_aio_flush


From: Jens Axboe
Subject: Re: [Qemu-devel] [PATCH] bdrv_aio_flush
Date: Tue, 2 Sep 2008 16:28:07 +0200

On Tue, Sep 02 2008, Ian Jackson wrote:
> Jamie Lokier writes ("Re: [Qemu-devel] [PATCH] bdrv_aio_flush"):
> > Andrea thinks bdrv_aio_flush does guarantee that in flight operations
> > are flushed, while bdrv_flush definitely does not (fsync doesn't).
> 
> I read Andrea as complaining that bdrv_aio_flush _should_ flush in
> flight operations but does not.
> 
> > I vaguely recall from the discussion before, there was uncertainty
> > about whether that is true, and therefore the right thing to do was
> > wait for the in flight AIOs to complete _first_ and then issue an
> > fsync or aio_fsync call.
> 
> Whether bdrv_aio_flush should do this is a question of the qemu
> internal API.
> 
> > The Open Group text for aio_fsync says: "shall asynchronously force
> > all I/O operations [...]  queued at the time of the call to aio_fsync
> > [...]".
> 
> We discussed the meaning of the unix specs before.  I did a close
> textual analysis of it here:
> 
>   http://lists.nongnu.org/archive/html/qemu-devel/2008-04/msg00046.html
> 
> > Since then, I've read the Open Group specifications more closely, and
> > some other OS man pages, and they are consistent that _writes_ always
> > occur in the order they are submitted to aio_write.
> 
> Can you give me a chapter and verse quote for that ?  I'm looking at
> SuSv3 eg
>   http://www.opengroup.org/onlinepubs/009695399/nfindex.html
> 
> All I can see is this:
>   If O_APPEND is set for the file descriptor, write operations append
>   to the file in the same order as the calls were made.
> 
> If things are as you say and aio writes are always executed in the
> order queued, there would be no need to specify that because it would
> be implied by the behaviour of write(2) and O_APPEND.
> 
> > What _can_ be queued is a WRITE FUA command: meaning write some data
> > and flush _this_ data to non-volatile storage.
> 
> In order to implement that interface without flushing other data
> unecessarily, we need to be able to
> 
>    submit other IO requests
>    submit aio_write request for WRITE FUA
>    asynchronously await completion of the aio_write for WRITE FUA
>    submit and perhaps collect completion of other IO requests
>    collect completion of aio_write for WRITE FUA
>    submit and perhaps collect completion of other IO requests
>    submit aio_fsync (for WRITE FUA)
>    submit and perhaps collect completion of other IO requests
>    collect aio_fsync (for WRITE FUA)
> 
> This is still not perfect because we unnecessarily flush some data
> thus delaying reporting completion of the WRITE FUA.  But there is at
> at least no need to wait for _other_ writes to complete.

I don't see how the above works. There's no dependency on FUA and
non-FUA writes, in fact FUA writes tend to jump the device queue due to
certain other operating systems using it for conditions where that is
appropriate. So unless you do all writes using FUA, there's no way
around a flush for committing dirty data. Unfortunately we don't have a
FLUSH_RANGE command, it's just a big sledge hammer.

-- 
Jens Axboe





reply via email to

[Prev in Thread] Current Thread [Next in Thread]