Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster

From:	Rik van Riel
Subject:	Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster
Date:	Wed, 20 Apr 2016 14:37:28 -0400

On Wed, 2016-04-20 at 06:40 -0400, Ric Wheeler wrote:
> On 04/20/2016 05:24 AM, Kevin Wolf wrote:
> > 
> > Am 20.04.2016 um 03:56 hat Ric Wheeler geschrieben:
> > > 
> > > On 04/19/2016 10:09 AM, Jeff Cody wrote:
> > > > 
> > > > On Tue, Apr 19, 2016 at 08:18:39AM -0400, Ric Wheeler wrote:
> > > > > 
> > > > > On 04/19/2016 08:07 AM, Jeff Cody wrote:
> > > > > > 
> > > > > > Bug fixes for gluster; third patch is to prevent
> > > > > > a potential data loss when trying to recover from
> > > > > > a recoverable error (such as ENOSPC).
> > > > > Hi Jeff,
> > > > > 
> > > > > Just a note, I have been talking to some of the disk drive
> > > > > people
> > > > > here at LSF (the kernel summit for file and storage people)
> > > > > and got
> > > > > a non-public confirmation that individual storage devices (s-
> > > > > ata
> > > > > drives or scsi) can also dump cache state when a synchronize
> > > > > cache
> > > > > command fails.  Also followed up with Rik van Riel - in the
> > > > > page
> > > > > cache in general, when we fail to write back dirty pages,
> > > > > they are
> > > > > simply marked "clean" (which means effectively that they get
> > > > > dropped).
> > > > > 
> > > > > Long winded way of saying that I think that this scenario is
> > > > > not
> > > > > unique to gluster - any failed fsync() to a file (or block
> > > > > device)
> > > > > might be an indication of permanent data loss.
> > > > > 
> > > > Ric,
> > > > 
> > > > Thanks.
> > > > 
> > > > I think you are right, we likely do need to address how QEMU
> > > > handles fsync
> > > > failures across the board in QEMU at some point
> > > > (2.7?).  Another point to
> > > > consider is that QEMU is cross-platform - so not only do we
> > > > have different
> > > > protocols, and filesystems, but also different underlying host
> > > > OSes as well.
> > > > It is likely, like you said, that there are other non-gluster
> > > > scenarios where
> > > > we have non-recoverable data loss on fsync failure.
> > > > 
> > > > With Gluster specifically, if we look at just ENOSPC, does this
> > > > mean that
> > > > even if Gluster retains its cache after fsync failure, we still
> > > > won't know
> > > > that there was no permanent data loss?  If we hit ENOSPC during
> > > > an fsync, I
> > > > presume that means Gluster itself may have encountered ENOSPC
> > > > from a fsync to
> > > > the underlying storage.  In that case, does Gluster just pass
> > > > the error up
> > > > the stack?
> > > > 
> > > > Jeff
> > > I still worry that in many non-gluster situations we will have
> > > permanent data loss here. Specifically, the way the page cache
> > > works, if we fail to write back cached data *at any time*, a
> > > future
> > > fsync() will get a failure.
> > And this is actually what saves the semantic correctness. If you
> > threw
> > away data, any following fsync() must fail. This is of course
> > inconvenient because you won't be able to resume a VM that is
> > configured
> > to stop on errors, and it means some data loss, but it's safe
> > because we
> > never tell the guest that the data is on disk when it really isn't.
> > 
> > gluster's behaviour (without resync-failed-syncs-after-fsync set)
> > is
> > different, if I understand correctly. It will throw away the data
> > and
> > then happily report success on the next fsync() call. And this is
> > what
> > causes not only data loss, but corruption.
> Yes, that makes sense to me - the kernel will remember that it could
> not write 
> data back from the page cache and the future fsync() will see an
> error.
> 
> > 
> > 
> > [ Hm, or having read what's below... Did I misunderstand and Linux
> >    returns failure only for a single fsync() and on the next one it
> >    returns success again? That would be bad. ]
> I would need to think through that scenario with the memory
> management people to 
> see if that could happen.

It could definitely happen.

1) block on disk contains contents A
2) page cache gets contents B written to it
3) fsync fails
4) page with contents B get evicted from memory
5) block with contents A gets read from disk

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] [PATCH for-2.6 v2 3/3] block/gluster: prevent data loss after i/o error, (continued)
- [Qemu-block] [PATCH for-2.6 v2 2/3] block/gluster: code movement of qemu_gluster_close(), Jeff Cody, 2016/04/19
- Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Ric Wheeler, 2016/04/19
  - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Jeff Cody, 2016/04/19
    - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Ric Wheeler, 2016/04/19
    - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Kevin Wolf, 2016/04/20
    - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Ric Wheeler, 2016/04/20
    - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Kevin Wolf, 2016/04/20
    - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Rik van Riel, 2016/04/20
    - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Kevin Wolf, 2016/04/21
    - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Rik van Riel <=
    - Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster, Raghavendra Gowdappa, 2016/04/20

Prev by Date: Re: [Qemu-block] [Nbd] question on ioctl NBD_SET_FLAGS
Next by Date: Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster
Previous by thread: Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster
Next by thread: Re: [Qemu-block] [PATCH for-2.6 v2 0/3] Bug fixes for gluster
Index(es):
- Date
- Thread