Re: [Qemu-devel] [Qemu-block] [PATCH v8 0/2] block: enforce minimal 4096

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH v8 0/2] block: enforce minimal 4096

From:	Stefan Hajnoczi
Subject:	Re: [Qemu-devel] [Qemu-block] [PATCH v8 0/2] block: enforce minimal 4096 alignment in qemu_blockalign
Date:	Wed, 13 May 2015 16:32:23 +0100
User-agent:	Mutt/1.5.23 (2014-03-12)

On Tue, May 12, 2015 at 05:30:54PM +0300, Denis V. Lunev wrote:
> I have used the following program to test
> #define _GNU_SOURCE
> 
> #include <stdio.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <malloc.h>
> #include <string.h>
> 
> int main(int argc, char *argv[])
> {
>     int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>     void *buf;
>     int i = 0, align = atoi(argv[2]);
> 
>     do {
>         buf = memalign(align, 4096);
>         if (align >= 4096)
>             break;
>         if ((unsigned long)buf & 4095)
>             break;
>         i++;
>     } while (1);
>     printf("%d %p\n", i, buf);
> 
>     memset(buf, 0x11, 4096);
> 
>     for (i = 0; i < 100000; i++) {
>         lseek(fd, SEEK_CUR, 4096);
>         write(fd, buf, 4096);
>     }
> 
>     close(fd);
>     return 0;
> }
> for in in `seq 1 30` ; do a.out aa ; done
> 
> The file was placed into 8 GB partition on HDD below to avoid speed
> change due to different offset on disk. Results are reliable:
> - 189 vs 180 seconds on Linux 3.16
> 
> The following setups have been tested:
> 1) ext4 with block size equals to 1024 over 512/512 physical/logical
>    sector size SSD disk
> 2) ext4 with block size equals to 4096 over 512/512 physical/logical
>    sector size SSD disk
> 3) ext4 with block size equals to 4096 over 512/4096 physical/logical
>    sector size rotational disk (WDC WD20EZRX)
> 4) xfs with block size equals to 4096 over 512/512 physical/logical
>    sector size SSD disk
> 
> The difference is quite reliable and the same 5%.
>   qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
> for image in qcow2 format is 1% faster.
> 
> qemu-img is also affected. The difference in between
>   qemu-img create -f qcow2 1.img 64G
>   qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
>   time for i in `seq 1 30` ; do qemu-img convert 1.img -t none -O raw 2.img ; 
> rm -rf 2.img ; done
> is around 126 vs 119 seconds.
> 
> The justification of the performance improve is quite interesting.
> From the kernel point of view each request to the disk was split
> by two. This could be seen by blktrace like this:
>   9,0   11  1     0.000000000 11151  Q  WS 312737792 + 1023 [qemu-img]
>   9,0   11  2     0.000007938 11151  Q  WS 312738815 + 8 [qemu-img]
>   9,0   11  3     0.000030735 11151  Q  WS 312738823 + 1016 [qemu-img]
>   9,0   11  4     0.000032482 11151  Q  WS 312739839 + 8 [qemu-img]
>   9,0   11  5     0.000041379 11151  Q  WS 312739847 + 1016 [qemu-img]
>   9,0   11  6     0.000042818 11151  Q  WS 312740863 + 8 [qemu-img]
>   9,0   11  7     0.000051236 11151  Q  WS 312740871 + 1017 [qemu-img]
>   9,0    5  1     0.169071519 11151  Q  WS 312741888 + 1023 [qemu-img]
> After the patch the pattern becomes normal:
>   9,0    6  1     0.000000000 12422  Q  WS 314834944 + 1024 [qemu-img]
>   9,0    6  2     0.000038527 12422  Q  WS 314835968 + 1024 [qemu-img]
>   9,0    6  3     0.000072849 12422  Q  WS 314836992 + 1024 [qemu-img]
>   9,0    6  4     0.000106276 12422  Q  WS 314838016 + 1024 [qemu-img]
> and the amount of requests sent to disk (could be calculated counting
> number of lines in the output of blktrace) is reduced about 2 times.
> 
> Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest
> does his job well and real requests comes properly aligned (to page).
> 
> Changes from v7:
> - make assignment from v6 unconditional (Kevin)
> 
> Changes from v6:
> - explicitely assign opt_mem_alignemnt in raw-posix.c with
>   MAX(s->buf_align, getpagesize()) (Kevin)
> 
> Changes from v5:
> - found justification from kernel point of view
> - fixed checkpatch warnings in the patch 2
> 
> Changes from v4:
> - patches reordered
> - dropped conversion from 512 to BDRV_SECTOR_SIZE
> - getpagesize() is replaced with MAX(4096, getpagesize()) as suggested by
>   Kevin
> 
> Changes from v3:
> - portable way to calculate system page size used
> - 512/4096 values are replaced with proper macros/values
> 
> Changes from v2:
> - opt_mem_alignment is split to opt_mem_alignment for bounce buffering
>   and min_mem_alignment to check buffers coming from guest.
> 
> Changes from v1:
> - enforces 4096 alignment in qemu_(try_)blockalign, avoid touching of
>   bdrv_qiov_is_aligned path not to enforce additional bounce buffering
>   as suggested by Paolo
> - reduces 10% to 5% in patch description to better fit 180 vs 189
>   difference
> 
> Signed-off-by: Denis V. Lunev <address@hidden>
> CC: Paolo Bonzini <address@hidden>
> CC: Kevin Wolf <address@hidden>
> CC: Stefan Hajnoczi <address@hidden>
> 
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

pgpXxMyjKD16M.pgp
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH v8 0/2] block: enforce minimal 4096 alignment in qemu_blockalign, Denis V. Lunev, 2015/05/12
- [Qemu-devel] [PATCH 1/2] block: minimal bounce buffer alignment, Denis V. Lunev, 2015/05/12
- [Qemu-devel] [PATCH 2/2] block: align bounce buffers to page, Denis V. Lunev, 2015/05/12
- Re: [Qemu-devel] [PATCH v8 0/2] block: enforce minimal 4096 alignment in qemu_blockalign, Kevin Wolf, 2015/05/13
- Re: [Qemu-devel] [Qemu-block] [PATCH v8 0/2] block: enforce minimal 4096 alignment in qemu_blockalign, Stefan Hajnoczi <=

Prev by Date: Re: [Qemu-devel] [PATCH 27/34] block: Add infrastructure for option inheritance
Next by Date: Re: [Qemu-devel] [RFC PATCH v0] numa: API to lookup NUMA node by address
Previous by thread: Re: [Qemu-devel] [PATCH v8 0/2] block: enforce minimal 4096 alignment in qemu_blockalign
Next by thread: Re: [Qemu-devel] Supporting multiple CPU AddressSpaces and memory transaction attributes
Index(es):
- Date
- Thread