[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: starting to look at qemu savevm performance, a first regression dete
From: |
Claudio Fontana |
Subject: |
Re: starting to look at qemu savevm performance, a first regression detected |
Date: |
Mon, 7 Mar 2022 13:26:08 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 |
On 3/7/22 1:20 PM, Daniel P. Berrangé wrote:
> On Mon, Mar 07, 2022 at 01:09:55PM +0100, Claudio Fontana wrote:
>> On 3/7/22 1:00 PM, Daniel P. Berrangé wrote:
>>> On Mon, Mar 07, 2022 at 12:19:22PM +0100, Claudio Fontana wrote:
>>>> On 3/7/22 10:51 AM, Daniel P. Berrangé wrote:
>>>>> On Mon, Mar 07, 2022 at 10:44:56AM +0100, Claudio Fontana wrote:
>>>>>> Hello Daniel,
>>>>>>
>>>>>> On 3/7/22 10:27 AM, Daniel P. Berrangé wrote:
>>>>>>> On Sat, Mar 05, 2022 at 02:19:39PM +0100, Claudio Fontana wrote:
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I have been looking at some reports of bad qemu savevm performance in
>>>>>>>> large VMs (around 20+ Gb),
>>>>>>>> when used in libvirt commands like:
>>>>>>>>
>>>>>>>>
>>>>>>>> virsh save domain /dev/null
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I have written a simple test to run in a Linux centos7-minimal-2009
>>>>>>>> guest, which allocates and touches 20G mem.
>>>>>>>>
>>>>>>>> With any qemu version since around 2020, I am not seeing more than 580
>>>>>>>> Mb/Sec even in the most ideal of situations.
>>>>>>>>
>>>>>>>> This drops to around 122 Mb/sec after commit:
>>>>>>>> cbde7be900d2a2279cbc4becb91d1ddd6a014def .
>>>>>>>>
>>>>>>>> Here is the bisection for this particular drop in throughput:
>>>>>>>>
>>>>>>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
>>>>>>>> Author: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>>> Date: Fri Feb 19 18:40:12 2021 +0000
>>>>>>>>
>>>>>>>> migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>>>>>>
>>>>>>>> The generic 'migrate_set_parameters' command handle all types of
>>>>>>>> param.
>>>>>>>>
>>>>>>>> Only the QMP commands were documented in the deprecations page,
>>>>>>>> but the
>>>>>>>> rationale for deprecating applies equally to HMP, and the
>>>>>>>> replacements
>>>>>>>> exist. Furthermore the HMP commands are just shims to the QMP
>>>>>>>> commands,
>>>>>>>> so removing the latter breaks the former unless they get
>>>>>>>> re-implemented.
>>>>>>>>
>>>>>>>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>>
>>>>>>> That doesn't make a whole lot of sense as a bisect result.
>>>>>>> How reliable is that bisect end point ? Have you bisected
>>>>>>> to that point more than once ?
>>>>>>
>>>>>> I did run through the bisect itself only once, so I'll double check that.
>>>>>> The results seem to be reproducible almost to the second though, a
>>>>>> savevm that took 35 seconds before the commit takes 2m 48 seconds after.
>>>>>>
>>>>>> For this test I am using libvirt v6.0.0.
>>>
>>> I've just noticed this. That version of libvirt is 2 years old and
>>> doesn't have full support for migrate_set_parameters.
>>>
>>>
>>>> 2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorIOWrite:452 :
>>>> QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0
>>>> buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M
>>>> len=93 ret=93 errno=0
>>>> 2022-03-07 10:47:20.146+0000: 134386: info :
>>>> qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY:
>>>> mon=0x7fa4380028a0 reply={"id": "libvirt-19", "error": {"class":
>>>> "CommandNotFound", "desc": "The command migrate_set_speed has not been
>>>> found"}}
>>>> 2022-03-07 10:47:20.147+0000: 134391: error :
>>>> qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU
>>>> command 'migrate_set_speed': The command migrate_set_speed has not been
>>>> found
>>>
>>> We see the migrate_set_speed failing and libvirt obviously ignores that
>>> failure.
>>>
>>> In current libvirt migrate_set_speed is not used as it properly
>>> handles migrate_set_parameters AFAICT.
>>>
>>> I think you just need to upgrade libvirt if you want to use this
>>> newer QEMU version
>>>
>>> Regards,
>>> Daniel
>>>
>>
>> Got it, this explains it, sorry for the noise on this.
>>
>> I'll continue to investigate the general issue of low throughput with virsh
>> save / qemu savevm .
>
> BTW, consider measuring with the --bypass-cache flag to virsh save.
> This causes libvirt to use a I/O helper that uses O_DIRECT when
> saving the image. This should give more predictable results by
> avoiding the influence of host I/O cache which can be in a differnt
> state of usage each time you measure. It was also intended that
> by avoiding hitting cache, saving the memory image of a large VM
> will not push other useful stuff out of host I/O cache which can
> negatively impact other running VMs.
>
> Also it is possible to configure compression on the libvirt side
> which may be useful if you have spare CPU cycles, but your storage
> is slow. See 'save_image_format' in the /etc/libvirt/qemu.conf
>
> With regards,
> Daniel
>
Hi Daniel, thanks for these good info,
regarding slow storage, for these tests I am saving to /dev/null to avoid
having to take storage into account
(and still getting low bandwidth unfortunately) so I guess compression is out
of the question.
Thanks!
Claudio
- Re: starting to look at qemu savevm performance, a first regression detected, Claudio Fontana, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected, Daniel P . Berrangé, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected, Claudio Fontana, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected, Daniel P . Berrangé, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected,
Claudio Fontana <=
- Re: starting to look at qemu savevm performance, a first regression detected, Dr. David Alan Gilbert, 2022/03/07
- bad qemu savevm to /dev/null performance (600 MiB/s max) (Was: Re: starting to look at qemu savevm performance, a first regression detected), Claudio Fontana, 2022/03/09
- Re: bad qemu savevm to /dev/null performance (600 MiB/s max) (Was: Re: starting to look at qemu savevm performance, a first regression detected), Dr. David Alan Gilbert, 2022/03/09
- Re: bad qemu savevm to /dev/null performance (600 MiB/s max) (Was: Re: starting to look at qemu savevm performance, a first regression detected), Daniel P . Berrangé, 2022/03/09
- Re: bad qemu savevm to /dev/null performance (600 MiB/s max) (Was: Re: starting to look at qemu savevm performance, a first regression detected), Claudio Fontana, 2022/03/09
- Re: bad virsh save /dev/null performance (600 MiB/s max), Claudio Fontana, 2022/03/09
- Re: bad virsh save /dev/null performance (600 MiB/s max), Daniel P . Berrangé, 2022/03/09
- Re: bad virsh save /dev/null performance (600 MiB/s max), Dr. David Alan Gilbert, 2022/03/09
- Re: bad virsh save /dev/null performance (600 MiB/s max), Claudio Fontana, 2022/03/09
- Re: bad virsh save /dev/null performance (600 MiB/s max), Dr. David Alan Gilbert, 2022/03/09