qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] block: add missed BDRV_O_NOCACHE when block


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH 1/1] block: add missed BDRV_O_NOCACHE when block device is opened without file
Date: Sat, 28 Jan 2017 17:23:30 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

On 25.01.2017 20:44, Denis V. Lunev wrote:
> On 01/25/2017 08:59 PM, Max Reitz wrote:
>> [CC-ing John]
>>
>> On 25.01.2017 17:42, Denis V. Lunev wrote:
>>> Technically there is a problem when the guest DVD is created by libvirt
>>> with AIO mode 'native' on Linux. Current QEMU is unable to start the
>>> domain configured as follows:
>>>     <disk type='file' device='cdrom'>
>>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>>       <target dev='sdb' bus='scsi'/>
>>>       <readonly/>
>>>     </disk>
>>> The problem comes from the combination of 'cache' and 'io' options.
>>>
>>> 'io' option is common option and it is removed from block driver
>>> specific options. 'cache' originally is not. The patch makes 'cache'
>>> option common. This works fine as long as cdrom media insertion
>>> later on.
>>>
>>> Signed-off-by: Denis V. Lunev <address@hidden>
>>> CC: Kevin Wolf <address@hidden>
>>> CC: Max Reitz <address@hidden>
>>> ---
>>>  blockdev.c | 18 +++++++++++++++---
>>>  1 file changed, 15 insertions(+), 3 deletions(-)
>> There was a Red Hat BZ for this:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1342999
>>
>> There is now a corresponding BZ for libvirt:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1377321
>>
>> The gist is that it was determined to be a problem with libvirt.
>>
>> RHEL has a downstream commit to work around this issue by ignoring the
>> cache mode set for empty CD-ROMs -- but that is only because that was
>> simpler than fixing libvirt.
>>
>> (Note that it's ignoring the cache mode instead of actually evaluating it.)
>>
> This is what I have exactly started from:
> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.6.0-27.el7.src.rpm
> 
> This package starts VM well for the above mentioned configuration:
> 
>     <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw' cache='none' io='native'/>
>       <target dev='sdb' bus='scsi'/>
>       <readonly/>
>     </disk>
> 
> but the problem comes later at 'change' moment. It reports
> 
> 'Details: internal error: unable to execute QEMU command 'change':
> aio=native was specified, but it requires cache.direct=on, which
> was not specified.)'
> 
> 
> Thus partial solution implemented by the RedHat is really
> partial and does not work at the second stage.

Yes, because it is not supposed to work for that. The only thing the
downstream patch does is ignore the cache mode so you can still start
the VM even if libvirt decides to specify a cache parameter for an empty
drive (which it should not do).

>                                                I have started from
> 
> diff --git a/blockdev.c b/blockdev.c
> index d280dc4..e2c9053 100644   
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -608,6 +608,13 @@ static BlockBackend *blockdev_init(const char
> *file, QDict *bs_opts,
>              goto early_err;
>          }
>  
> +        if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_DIRECT)) {
> +            bdrv_flags |= BDRV_O_NOCACHE;
> +        }
> +        if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_NO_FLUSH)) {
> +            bdrv_flags |= BDRV_O_NO_FLUSH;
> +        }
> +
>          blk_rs = blk_get_root_state(blk);
>          blk_rs->open_flags    = bdrv_flags;
>          blk_rs->read_only     = !(bdrv_flags & BDRV_O_RDWR);
> 
> The problem for us is that we have such previously valid configurations
> in the field.
> 
>>> May be this has already discussed, but still. AIO=native for CDROM without
>>> media seems important case.
>> First, I personally don't find aio=native very important for CD-ROM
>> drives. Yes, we shouldn't make IDE CD-ROM slower than necessary, but I
>> don't think aio=native will make the difference between "Slow like
>> CD-ROMs are in reality" and "As fast as virtio".
> the problem for me is that each clone() call is costly and counts. That
> is why we are trying to avoid it whenever possible. That is why 'native'
> mode is MUCH better. Also it would be very nice not to use cached
> IO, which is very good for memory overcommit situations.
> 
> Here I am fighting not with the performance, which does not make
> any real sense, but with a memory footprint.

Fair enough. Still, specifying a cache mode for an empty drive simply
doesn't make sense.

Have you considered inserting a null-co:// file at startup as a
workaround? If you use the old 'change' (or blockdev-change-medium)
command, it should retain AIO and cache mode.

>> Second, all this patch does is revert some changes done by commit
>> 91a097e7478940483e76d52217f05bc05b98d5a5, which was very deliberate.
>>
>> Third, you may then be asking for the recommended way to put an
>> aio=native medium into a CD-ROM drive. Good thing you ask, because we
>> have a way that we want to recommend but can't because it's still
>> considered experimental:
>>
>> The BDS is added using blockdev-add, with all of the appropriate caching
>> and aio options. Then it's inserted into the drive using the
>> x-blockdev-insert-medium command, and the drive is closed using
>> blockdev-close-tray.
> the problem, again, is that with x-blockdev-insert-medium I can not
> deal with block driver options, or may be I am missing something.

x-blockdev-insert-medium just inserts a BDS as a medium. The BDS is
added via blockdev-add which allows you to set all of these options.

>> There are a couple of issues with this: First, blockdev-add and
>> x-blockdev-insert-medium are still experimental. The good news are that
>> I think the reason why the latter is experimental has actually
>> disappeared and we can just drop the x- by now. The
>> not-so-good-but-not-so-bad-either news are that we plan to get
>> blockdev-add declared non-experimental for 2.9. Let's see how it goes.
> Does this mean that x-blockdev-del would also lose x- prefix?

As far as I'm aware, yes.

>> The other issue is that of course it's more cumbersome to use than a
>> simple 'change' via HMP. I argue that if someone communicates with a VM
>> by hand, they either have to deal with this added complexity or they
>> cannot use aio=native for CD-ROM drives -- which, as I said, I don't
>> think is too bad.
> this is how 'virsh change-media' works at the moment, at least with
> latest downstreams.
> That is why this is not that clear.

Well, that can be easily changed, the question is how fast that will be.
You would still need to make libvirt aware of what cache mode to use for
the new medium, and that may require deeper changes.

(I supposed libvirt only stores a cache mode per drive while it should
actually store a cache mode per image.)

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]