Re: [Qemu-block] [PATCH] block: posix: Always allocate the first block

On Sat, Aug 17, 2019 at 12:57 AM John Snow <address@hidden> wrote:

On 8/16/19 5:21 PM, Nir Soffer wrote:
> When creating an image with preallocation "off" or "falloc", the first
> block of the image is typically not allocated. When using Gluster
> storage backed by XFS filesystem, reading this block using direct I/O
> succeeds regardless of request length, fooling alignment detection.
>
> In this case we fallback to a safe value (4096) instead of the optimal
> value (512), which may lead to unneeded data copying when aligning
> requests. Allocating the first block avoids the fallback.
>

Where does this detection/fallback happen? (Can it be improved?)

In raw_probe_alignment().

This patch explain the issues:

https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00568.html

Here Kevin and me discussed ways to improve it:

https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00426.html

> When using preallocation=off, we always allocate at least one filesystem
> block:
>
> $ ./qemu-img create -f raw test.raw 1g
> Formatting 'test.raw', fmt=raw size=1073741824
>
> $ ls -lhs test.raw
> 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
>
> I did quick performance tests for these flows:
> - Provisioning a VM with a new raw image.
> - Copying disks with qemu-img convert to new raw target image
>
> I installed Fedora 29 server on raw sparse image, measuring the time
> from clicking "Begin installation" until the "Reboot" button appears:
>
> Before(s) After(s) Diff(%)
> -------------------------------
> 356 389 +8.4
>
> I ran this only once, so we cannot tell much from these results.
>

That seems like a pretty big difference for just having pre-allocated a
single block. What was the actual command line / block graph for that test?

Having the first block allocated changes the alignment.

Before this patch, we detect request_alignment=1, so we fallback to 4096.

Then we detect buf_align=1, so we fallback to value of request alignment.

The guest see a disk with:

logical_block_size = 512

physical_block_size = 512

But qemu uses:

request_alignment = 4096

buf_align = 4096

storage uses:

logical_block_size = 512

physical_block_size = 512

If the guest does direct I/O using 512 bytes aligment, qemu has to copy

the buffer to align them to 4096 bytes.

After this patch, qemu detects the alignment correctly, so we have:

guest

logical_block_size = 512

physical_block_size = 512

qemu

request_alignment = 512

buf_align = 512

storage:

logical_block_size = 512

physical_block_size = 512

We expect this to be more efficient because qemu does not have to emulate

anything.

Was this over a network that could explain the variance?

Maybe, this is complete install of Fedora 29 server, I'm not sure if the installation

access the network.

> The second test was cloning the installation image with qemu-img
> convert, doing 10 runs:
>
> for i in $(seq 10); do
> rm -f dst.raw
> sleep 10
> time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
> done
>
> Here is a table comparing the total time spent:
>
> Type Before(s) After(s) Diff(%)
> ---------------------------------------
> real 530.028 469.123 -11.4
> user 17.204 10.768 -37.4
> sys 17.881 7.011 -60.7
>
> Here we see very clear improvement in CPU usage.
>

Hard to argue much with that. I feel a little strange trying to force
the allocation of the first block, but I suppose in practice "almost no
preallocation" is indistinguishable from "exactly no preallocation" if
you squint.

Right.

The real issue is that filesystems and block devices do not expose the alignment

requirement for direct I/O, so we need to use these hacks and assumptions.

With local XFS we use xfsctl(XFS_IOC_DIOINFO) to get request_alignment, but this does

not help for XFS filesystem used by Gluster on the server side.

I hope that Niels is working on adding similar ioctl for Glsuter, os it can expose the properties

of the remote filesystem.

Nir

From:	Nir Soffer
Subject:	Re: [Qemu-block] [PATCH] block: posix: Always allocate the first block
Date:	Sat, 17 Aug 2019 01:45:14 +0300