[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS
From: |
Max Reitz |
Subject: |
Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS |
Date: |
Fri, 1 Nov 2019 14:40:28 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 |
On 01.11.19 14:36, Denis Lunev wrote:
> On 11/1/19 4:09 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 01.11.2019 15:34, Max Reitz wrote:
>>> On 01.11.19 12:20, Max Reitz wrote:
>>>> On 01.11.19 12:16, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 01.11.2019 14:12, Max Reitz wrote:
>>>>>> On 01.11.19 11:28, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 01.11.2019 13:20, Max Reitz wrote:
>>>>>>>> On 01.11.19 11:00, Max Reitz wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This series builds on the previous RFC. The workaround is now applied
>>>>>>>>> unconditionally of AIO mode and filesystem because we don’t know those
>>>>>>>>> things for remote filesystems. Furthermore,
>>>>>>>>> bdrv_co_get_self_request()
>>>>>>>>> has been moved to block/io.c.
>>>>>>>>>
>>>>>>>>> Applying the workaround unconditionally is fine from a performance
>>>>>>>>> standpoint, because it should actually be dead code, thanks to patch 1
>>>>>>>>> (the elephant in the room). As far as I know, there is no other block
>>>>>>>>> driver but qcow2 in handle_alloc_space() that would submit zero writes
>>>>>>>>> as part of normal I/O so it can occur concurrently to other write
>>>>>>>>> requests. It still makes sense to take the workaround for file-posix
>>>>>>>>> because we can’t really prevent that any other block driver will
>>>>>>>>> submit
>>>>>>>>> zero writes as part of normal I/O in the future.
>>>>>>>>>
>>>>>>>>> Anyway, let’s get to the elephant.
>>>>>>>>>
>>>>>>>>> From input by XFS developers
>>>>>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1765547#c7) it seems
>>>>>>>>> clear
>>>>>>>>> that c8bb23cbdbe causes fundamental performance problems on XFS with
>>>>>>>>> aio=native that cannot be fixed. In other cases, c8bb23cbdbe improves
>>>>>>>>> performance or we wouldn’t have it.
>>>>>>>>>
>>>>>>>>> In general, avoiding performance regressions is more important than
>>>>>>>>> improving performance, unless the regressions are just a minor corner
>>>>>>>>> case or insignificant when compared to the improvement. The XFS
>>>>>>>>> regression is no minor corner case, and it isn’t insignificant.
>>>>>>>>> Laurent
>>>>>>>>> Vivier has found performance to decrease by as much as 88 % (on
>>>>>>>>> ppc64le,
>>>>>>>>> fio in a guest with 4k blocks, iodepth=8: 1662 kB/s from 13.9 MB/s).
>>>>>>>> Ah, crap.
>>>>>>>>
>>>>>>>> I wanted to send this series as early today as possible to get as much
>>>>>>>> feedback as possible, so I’ve only started doing benchmarks now.
>>>>>>>>
>>>>>>>> The obvious
>>>>>>>>
>>>>>>>> $ qemu-img bench -t none -n -w -S 65536 test.qcow2
>>>>>>>>
>>>>>>>> on XFS takes like 6 seconds on master, and like 50 to 80 seconds with
>>>>>>>> c8bb23cbdbe reverted. So now on to guest tests...
>>>>>>> Aha, that's very interesting) What about aio-native which should be
>>>>>>> slowed down?
>>>>>>> Could it be tested like this?
>>>>>> That is aio=native (-n).
>>>>>>
>>>>>> But so far I don’t see any significant difference in guest tests (i.e.,
>>>>>> fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1
>>>>>> --ioengine=libaio --thread --numjobs=16 --size=2G --time_based), neither
>>>>>> with 64 kB nor with 2 MB clusters. (But only on XFS, I’ll have to see
>>>>>> about ext4 still.)
>>>>> hmm, this possibly mostly tests writes to already allocated clusters. Has
>>>>> fio
>>>>> an option to behave like qemu-img bench with -S 65536, i.e. write once
>>>>> into
>>>>> each cluster?
>>>> Maybe, but is that a realistic depiction of whether this change is worth
>>>> it? That is why I’m doing the guest test, to see whether it actually
>>>> has much impact on the guest.
>>> I’ve changed the above fio invocation to use --rw=randwrite and added
>>> --fallocate=none. The performance went down, but it went down both with
>>> and without c8bb23cbdbe.
>>>
>>> So on my XFS system (XFS on luks on SSD), I see:
>>> - with c8bb23cbdbe: 26.0 - 27.9 MB/s
>>> - without c8bb23cbdbe: 25.6 - 27 MB/s
>>>
>>> On my ext4 system (native on SSD), I see:
>>> - with: 39.4 - 41.5 MB/s
>>> - without: 39.4 - 42.0 MB/s
>>>
>>> So basically no difference for XFS, and really no difference for ext4.
>>> (I ran these tests with 2 MB clusters.)
>>>
>> Hmm. I don't know. For me it seems obvious that zeroing 2M cluster is slow,
>> and this
>> is proved by simple tests with qemu-img bench, that fallocate is faster than
>> zeroing
>> most of the cluster.
>>
>> So, if some guest test doesn't show the difference, this means that "small
>> write into
>> new cluster" is effectively rare case in this test.. And this doesn't prove
>> that it's
>> always rare and insignificant.
>>
>> I don't sure that we have a real-world example that proves necessity of this
>> optimization,
>> or was there some original bug about low-performance which was fixed by this
>> optimization..
>> Den, Anton, do we have something about it?
>>
> sorry, I have missed the beginning of the thread.
>
> Which driver is used for virtual disk - cached or non-cached IO
> is used in QEMU? We use non-cached by default and this could
> make a difference significantly.
I’m using no cache, the above tests were done with aio=native; I’ve sent
another response with aio=threads numbers.
> Max,
>
> can you pls share your domain.xml of the guest config and
> fio file for guest. I will recheck to be 120% sure.
I’m running qemu directly as follows:
x86_64-softmmu/qemu-system-x86_64 \
-serial stdio \
-cdrom ~/tmp/arch.iso \
-m 4096 \
-enable-kvm \
-drive \
if=none,id=t,format=qcow2,file=test/test.qcow2,cache=none,aio=native \
-device virtio-scsi \
-device scsi-hd,drive=t \
-net user \
-net nic,model=rtl8139 \
-smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \
-cpu SandyBridge \
-nodefaults \
-nographic
The full FIO command line is:
fio --rw=randwrite --bs=4k --iodepth=8 --runtime=1m --direct=1 \
--filename=/mnt/foo --name=job1 --ioengine=libaio --thread \
--group_reporting --numjobs=16 --size=2G --time_based \
--output=/tmp/fio_result --fallocate=none
Max
signature.asc
Description: OpenPGP digital signature
- [PATCH for-4.2 3/4] block: Add bdrv_co_get_self_request(), (continued)
- [PATCH for-4.2 3/4] block: Add bdrv_co_get_self_request(), Max Reitz, 2019/11/01
- [PATCH for-4.2 4/4] block/file-posix: Let post-EOF fallocate serialize, Max Reitz, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Max Reitz, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Vladimir Sementsov-Ogievskiy, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Max Reitz, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Vladimir Sementsov-Ogievskiy, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Max Reitz, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Max Reitz, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Vladimir Sementsov-Ogievskiy, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Denis Lunev, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS,
Max Reitz <=
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Max Reitz, 2019/11/01
- Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS, Max Reitz, 2019/11/01