[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] Performance impact of the qcow2 overlap checks
From: |
John Snow |
Subject: |
Re: [Qemu-block] Performance impact of the qcow2 overlap checks |
Date: |
Tue, 31 Jan 2017 07:23:33 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 |
On 01/23/2017 11:29 AM, Max Reitz wrote:
> On 23.01.2017 14:30, Alberto Garcia wrote:
>> On Sat 21 Jan 2017 04:27:42 PM CET, Max Reitz wrote:
>>>>> I have optimized these checks. See:
>>>>>
>>>>> http://lists.nongnu.org/archive/html/qemu-block/2015-08/msg00020.html
>>>>>
>>>>> Feel free to review. If you want, I can rebase the series.
>>>>>
>>>>> So far nobody has seriously done so because it wasn't considered
>>>>> important enough, and the patches are not exactly trivial.
>>>>
>>>> I see. I wonder if it's worth complicating the code so much more
>>>> instead of simply disabling the tests.
>>>>
>>>> For the 'refcount-block' check, which is the one that has an
>>>> immediate impact in all kinds of images, I was wondering if we could
>>>> simply use the image size to determine how many entries need to be
>>>> checked. This would keep things much faster unless the image grows
>>>> abnormally bigger.
>>>
>>> Have you tested that it helps? I would assume that checking one
>>> (mostly) unused cluster of the refcount table does not take longer
>>> than checking a single refcount block. Any entry that's 0 will just be
>>> skipped.
>>
>> The problem is that there's 8000 entries to be skipped. In most cases it
>> doesn't make a difference because the bottleneck is in the hard drive,
>> but if it's fast enough then it starts to show. You can see it easily if
>> you store the qcow2 image in RAM.
>
> No, the problem is that I was just wrong. :-)
>
> By "checking a single refcount block" I meant iterating through its
> entries. But that's wrong, you don't have to do that.
>
>
> Another thing we might think about is just disabling the overlap check
> by defaults for the refcount structures. We could just emit a warning
> somewhere else when seeing overly large refcounts (i.e. greater than the
> number of internal snapshots plus one). My reasoning is the following:
>
> L1/L2 structures are more important. When overwriting them, you have a
> problem. Refcount data can always be recalculated.
>
> Refcount data will only be queried when writing data to the image. If
> that data has been overwritten, we have a chance that it is being set to
> 0 (which is rather large because 0 generally has a higher probability of
> being a part of data, admittedly). But we also have a chance that it is
> set to something else, which generally will be greater than the number
> of internal snapshots (+ 1). Therefore, such corruption should be easily
> detectable before much data is wrongly overwritten.
>
> The drawbacks with this approach would be the following:
> (1) Is printing a warning enough to make the user shut down the VM as
> fast as possible and run qemu-img check?
No.
> (2) It is legal to have a greater refcount than the number of internal
> snapshots plus one. qemu never produces such images, though (or does
> it?). Could there be existing images where users will be just annoyed by
> such warnings? Should we add a runtime option to disable them?
>
I'd assume it's not legal? If you have a refcount > 1 and delete the
last reference, wouldn't that refcount be then incorrect...?
Or I guess maybe an external tool may be playing games and using it as a
"sticky" bit of sorts?
>
> And of course another approach I already mentioned would be to scrap the
> overlap checks altogether once we have image locking (and I guess we can
> keep them around in their current form at least until then).
>
> Max
>
I think it's worth having them. Perhaps if you use image locking they
can be disabled by default, but I wouldn't really advocate for removing
them.
I think what you are pointing out with refcount blocks not being
essential to protect is probably pretty sufficient as an optimization,
too. Or I guess we can merge your ~real~ series to help optimize things.
I know in the little qcheck utility I wrote I use RBtrees of ranges that
get merged together; the tool doesn't really care which kinds of
metadata it is, it just knows that it has "metadata" that should be
protected. I think it's fast enough. If the qcow2 is kept fairly
defragmented, it's REALLY fast. IIRC, your patchset has something
similar, so it should be fast enough too.
--js
- [Qemu-block] Performance impact of the qcow2 overlap checks, Alberto Garcia, 2017/01/18
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Max Reitz, 2017/01/18
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Alberto Garcia, 2017/01/19
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Max Reitz, 2017/01/21
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Alberto Garcia, 2017/01/23
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Max Reitz, 2017/01/23
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Alberto Garcia, 2017/01/24
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Max Reitz, 2017/01/25
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Alberto Garcia, 2017/01/25
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Max Reitz, 2017/01/25
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks,
John Snow <=
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Alberto Garcia, 2017/01/31
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Max Reitz, 2017/01/31
- Re: [Qemu-block] Performance impact of the qcow2 overlap checks, Max Reitz, 2017/01/31