qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] file-posix: Support fallocate for block device


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH] file-posix: Support fallocate for block device
Date: Wed, 28 Mar 2018 12:35:44 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 03/27/2018 09:37 PM, zhenwei.pi wrote:
since linux 4.9, block device supports fallocate. kernel issues
block device zereout request and invalidates page cache. So
ioctl(fd, FALLOC_FL_ZERO_RANGE...) is safer than ioctl(fd,

did you mean fallocate() in the first half of the sentence?

BLKZEROOUT...). try to call do_fallocate, if failing, fallback.

use new field "has_fallocate_zero_range" with default value as
true. if do_fallocate returns -ENOTSUP, it will be set false.

Signed-off-by: zhenwei.pi <address@hidden>
---
  block/file-posix.c | 27 +++++++++++++++++----------
  1 file changed, 17 insertions(+), 10 deletions(-)


This feels more like a feature for 2.13, than a bug fix that would fit during freeze for 2.12.

diff --git a/block/file-posix.c b/block/file-posix.c
index d7fb772..842e940 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -159,8 +159,9 @@ typedef struct BDRVRawState {
      bool discard_zeroes:1;
      bool use_linux_aio:1;
      bool page_cache_inconsistent:1;
-    bool has_fallocate;
-    bool needs_alignment;
+    bool has_fallocate:1;
+    bool has_fallocate_zero_range:1;
+    bool needs_alignment:1;
PRManager *pr_mgr;
  } BDRVRawState;
@@ -549,6 +550,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
s->has_discard = true;
      s->has_write_zeroes = true;
+    s->has_fallocate_zero_range = true;

Is blindly setting this to true reasonable, given that...

      if ((bs->open_flags & BDRV_O_NOCACHE) != 0) {
          s->needs_alignment = true;
      }
@@ -1365,10 +1367,6 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
      int64_t len;
  #endif
- if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
-        return handle_aiocb_write_zeroes_block(aiocb);
-    }
-
  #ifdef CONFIG_XFS
      if (s->is_xfs) {
          return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
@@ -1376,16 +1374,25 @@ static ssize_t 
handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
  #endif
#ifdef CONFIG_FALLOCATE_ZERO_RANGE
-    if (s->has_write_zeroes) {

...later use is guarded by something learned at compile time?

+    /* since linux 4.9, block device supports fallocate. kernel issues

s/since/Since/
s/device supports/devices support/
s/kernel issues/The kernel issues a/

+     * block device zereout request and invalidates page cache. So
+     * ioctl(fd, FALLOC_FL_ZERO_RANGE...) is safer than ioctl(fd,

Same comment as on commit message; this looks like you meant fallocate rather than ioctl on one of the two uses.

+     * BLKZEROOUT...). try to call do_fallocate, if failing, fallback.

s/try/Try/
s/if failing, fallback/and fall back if that fails/

+     */
+    if (s->has_fallocate_zero_range) {
          int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
                                 aiocb->aio_offset, aiocb->aio_nbytes);
-        if (ret == 0 || ret != -ENOTSUP) {
+        if (ret == 0) {
              return ret;
-        }
-        s->has_write_zeroes = false;
+        } else if (ret == -ENOTSUP)
+            s->has_fallocate_zero_range = false;
      }

Before your patch, if we get any failure other than -ENOTSUP, we exit immediately rather than attempting a fallback. Your code breaks that paradigm, and blindly attempts the fallback even when the failure was something like EIO.

  #endif
+ if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
+        return handle_aiocb_write_zeroes_block(aiocb);
+    }
+
  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
      if (s->has_discard && s->has_fallocate) {
          int ret = do_fallocate(s->fd,


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]