qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss af


From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error
Date: Wed, 6 Apr 2016 13:51:59 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 06.04.2016 um 13:41 hat Kevin Wolf geschrieben:
> Am 06.04.2016 um 13:19 hat Ric Wheeler geschrieben:
> > 
> > We had a thread discussing this not on the upstream list.
> > 
> > My summary of the thread is that I don't understand why gluster
> > should drop cached data after a failed fsync() for any open file.
> 
> It certainly shouldn't, but it does by default. :-)
> 
> Have a look at commit 3fcead2d in glusterfs.git, which at least
> introduces an option to get usable behaviour:
> 
>     { .key = {"resync-failed-syncs-after-fsync"},
>       .type = GF_OPTION_TYPE_BOOL,
>       .default_value = "off",
>       .description = "If sync of \"cached-writes issued before fsync\" "
>                      "(to backend) fails, this option configures whether "
>                      "to retry syncing them after fsync or forget them. "
>                      "If set to on, cached-writes are retried "
>                      "till a \"flush\" fop (or a successful sync) on sync "
>                      "failures. "
>                      "fsync itself is failed irrespective of the value of "
>                      "this option. ",
>     },
> 
> As you can see, the default is still to drop cached data, and this is
> with the file still opened. qemu needs to make sure that this option is
> set, and if Jeff's comment in the code below is right, there is no way
> currently to make sure that the option isn't silently ignored.
> 
> Can we get some function that sets an option and fails if the option is
> unknown? Or one that queries the state after setting an option, so we
> can check whether we succeeded in switching to the mode we need?
> 
> > For closed files, I think it might still happen but this is the same
> > as any file system (and unlikely to be the case for qemu?).
> 
> Our problem is only with open images. Dropping caches for files that
> qemu doesn't use any more is fine as far as I'm concerned.
> 
> Note that our usage can involve cases where we reopen a file with
> different flags, i.e. first open a second file descriptor, then close
> the first one. The image was never completely closed here and we would
> still want the cache to preserve our data in such cases.

Hm, actually, maybe we should just call bdrv_flush() before reopening an
image, and if an error is returned, we abort the reopen. It's far from
being a hot path, so the overhead of a flush shouldn't matter, and it
seems we're taking an unnecessary risk without doing this.

Kevin

> > I will note that Linux in general had (still has I think?) the
> > behavior that once the process closes a file (or exits), we lose
> > context to return an error to. From that point on, any failed IO
> > from the page cache to the target disk will be dropped from cache.
> > To hold things in the cache would lead it to fill with old data that
> > is not really recoverable and we have no good way to know that the
> > situation is repairable and how long that might take. Upstream
> > kernel people have debated this, the behavior might be tweaked for
> > certain types of errors.
> 
> That's fine, we just don't want the next fsync() to signal success when
> in reality the cache has thrown away our data. As soon as we close the
> image, there is no next fsync(), so you can do whatever you like.
> 
> Kevin
> 
> > On 04/06/2016 07:02 AM, Kevin Wolf wrote:
> > >[ Adding some CCs ]
> > >
> > >Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > >>Upon receiving an I/O error after an fsync, by default gluster will
> > >>dump its cache.  However, QEMU will retry the fsync, which is especially
> > >>useful when encountering errors such as ENOSPC when using the werror=stop
> > >>option.  When using caching with gluster, however, the last written data
> > >>will be lost upon encountering ENOSPC.  Using the cache xlator option of
> > >>'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > >>cached data after a failed fsync, so that ENOSPC and other transient
> > >>errors are recoverable.
> > >>
> > >>Signed-off-by: Jeff Cody <address@hidden>
> > >>---
> > >>  block/gluster.c | 27 +++++++++++++++++++++++++++
> > >>  configure       |  8 ++++++++
> > >>  2 files changed, 35 insertions(+)
> > >>
> > >>diff --git a/block/gluster.c b/block/gluster.c
> > >>index 30a827e..b1cf71b 100644
> > >>--- a/block/gluster.c
> > >>+++ b/block/gluster.c
> > >>@@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> > >>QDict *options,
> > >>          goto out;
> > >>      }
> > >>+#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > >>+    /* Without this, if fsync fails for a recoverable reason (for 
> > >>instance,
> > >>+     * ENOSPC), gluster will dump its cache, preventing retries.  This 
> > >>means
> > >>+     * almost certain data loss.  Not all gluster versions support the
> > >>+     * 'resync-failed-syncs-after-fsync' key value, but there is no way 
> > >>to
> > >>+     * discover during runtime if it is supported (this api returns 
> > >>success for
> > >>+     * unknown key/value pairs) */
> > >Honestly, this sucks. There is apparently no way to operate gluster so
> > >we can safely recover after a failed fsync. "We hope everything is fine,
> > >but depending on your gluster version, we may now corrupt your image"
> > >isn't very good.
> > >
> > >We need to consider very carefully if this is good enough to go on after
> > >an error. I'm currently leaning towards "no". That is, we should only
> > >enable this after Gluster provides us a way to make sure that the option
> > >is really set.
> > >
> > >>+    ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
> > >>+                                           
> > >>"resync-failed-syncs-after-fsync",
> > >>+                                           "on");
> > >>+    if (ret < 0) {
> > >>+        error_setg_errno(errp, errno, "Unable to set xlator key/value 
> > >>pair");
> > >>+        ret = -errno;
> > >>+        goto out;
> > >>+    }
> > >>+#endif
> > >We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
> > >In this case (as well as theoretically in the case that the option
> > >didn't take effect - if only we could know about it), a failed
> > >glfs_fsync_async() is fatal and we need to stop operating on the image,
> > >i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
> > >The guest will see a broken disk that fails all I/O requests, but that's
> > >better than corrupting data.
> > >
> > >Kevin
> > 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]