qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] Performance impact of the qcow2 overlap checks


From: Max Reitz
Subject: Re: [Qemu-block] Performance impact of the qcow2 overlap checks
Date: Mon, 23 Jan 2017 17:29:49 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

On 23.01.2017 14:30, Alberto Garcia wrote:
> On Sat 21 Jan 2017 04:27:42 PM CET, Max Reitz wrote:
>>>> I have optimized these checks. See:
>>>>
>>>> http://lists.nongnu.org/archive/html/qemu-block/2015-08/msg00020.html
>>>>
>>>> Feel free to review. If you want, I can rebase the series.
>>>>
>>>> So far nobody has seriously done so because it wasn't considered
>>>> important enough, and the patches are not exactly trivial.
>>>
>>> I see. I wonder if it's worth complicating the code so much more
>>> instead of simply disabling the tests.
>>>
>>> For the 'refcount-block' check, which is the one that has an
>>> immediate impact in all kinds of images, I was wondering if we could
>>> simply use the image size to determine how many entries need to be
>>> checked. This would keep things much faster unless the image grows
>>> abnormally bigger.
>>
>> Have you tested that it helps? I would assume that checking one
>> (mostly) unused cluster of the refcount table does not take longer
>> than checking a single refcount block. Any entry that's 0 will just be
>> skipped.
> 
> The problem is that there's 8000 entries to be skipped. In most cases it
> doesn't make a difference because the bottleneck is in the hard drive,
> but if it's fast enough then it starts to show. You can see it easily if
> you store the qcow2 image in RAM.

No, the problem is that I was just wrong. :-)

By "checking a single refcount block" I meant iterating through its
entries. But that's wrong, you don't have to do that.


Another thing we might think about is just disabling the overlap check
by defaults for the refcount structures. We could just emit a warning
somewhere else when seeing overly large refcounts (i.e. greater than the
number of internal snapshots plus one). My reasoning is the following:

L1/L2 structures are more important. When overwriting them, you have a
problem. Refcount data can always be recalculated.

Refcount data will only be queried when writing data to the image. If
that data has been overwritten, we have a chance that it is being set to
0 (which is rather large because 0 generally has a higher probability of
being a part of data, admittedly). But we also have a chance that it is
set to something else, which generally will be greater than the number
of internal snapshots (+ 1). Therefore, such corruption should be easily
detectable before much data is wrongly overwritten.

The drawbacks with this approach would be the following:
(1) Is printing a warning enough to make the user shut down the VM as
fast as possible and run qemu-img check?
(2) It is legal to have a greater refcount than the number of internal
snapshots plus one. qemu never produces such images, though (or does
it?). Could there be existing images where users will be just annoyed by
such warnings? Should we add a runtime option to disable them?


And of course another approach I already mentioned would be to scrap the
overlap checks altogether once we have image locking (and I guess we can
keep them around in their current form at least until then).

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]