qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] Performance impact of the qcow2 overlap checks


From: Alberto Garcia
Subject: Re: [Qemu-block] Performance impact of the qcow2 overlap checks
Date: Wed, 25 Jan 2017 16:27:14 +0100
User-agent: Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu)

On Wed 25 Jan 2017 03:03:36 PM CET, Max Reitz wrote:
>>> (2) It is legal to have a greater refcount than the number of
>>> internal snapshots plus one. qemu never produces such images, though
>>> (or does it?). Could there be existing images where users will be
>>> just annoyed by such warnings? Should we add a runtime option to
>>> disable them?
>> 
>> I don't think it's legal, or is there any reason why it would be?
>
> Yes, the spec doesn't disallow it, so it's legal. :-)
>
> I had thought about using it for the conversion of zero clusters to
> data clusters in compat-level downgrading. You would only create a
> couple of zero-filled data clusters and then just reference them very
> often.

Ok, that's enough, then it's legal and it could be used at some point.

> I think it could make sense to limit refcounts to snapshot count + 1
> in the spec -- the only issue I take with that is that I'm not sure
> "better corruption detection" is a good reason to do so.

I agree.

> Also, while it makes sense to do the change to the spec *now*, it may
> be difficult to revert in the future if needed. Let's say someone
> comes up with a valid reason for wanting to use refcounts for
> something else than internal snapshots. Then, we could simply allow
> arbitrary refcounts again. But then-old qemus may then report such new
> images as corrupted all the time.

I also agree.

> We could circumvent this by adding a compatible
> has-arbitrary-refcounts flag at the same time. If that flag is set,
> refcounts may be arbitrary; otherwise, they are limited by the
> snapshot count.
>
> But again, that seems a bit much just so we can detect refcount
> overwrites...

Yes, it seems a bit too much :)

>>> And of course another approach I already mentioned would be to scrap
>>> the overlap checks altogether once we have image locking (and I
>>> guess we can keep them around in their current form at least until
>>> then).
>> 
>> I think the overlap checks are fine, at least in my tests I only
>> found problems with one of them, and only in some scenarios(*). So if
>> we cannot optimize them easily I'd simply tell the user about the
>> risks and suggest to disable them. Maybe the only thing that we need
>> is simply good documentation.
>
> Well, overlap-check=none should be simple enough.

I was thinking more of a short document detailing the overlap checks,
what they exactly do and how to disable them. It could go in the wiki /
blog post. To be honest I only learnt about them (and that they're
enabled by default) when I was profiling the qcow2 driver.

>> I think the most obvious candidate for optimization is
>> refcount-block, and as I said it's the check what would create the
>> bottleneck in most common scenarios. The optimization is simple: if
>> the size of the qcow2 image is 7GB then you only need to check the
>> first 4 entries in the refcount table.
>> 
>> I can think of two problems with this, which is why I haven't sent a
>> patch yet:
>> 
>> (1) This needs the current size of the image every time we want to
>>     perform that check, and that means I/O.
>
> Could be cached (every time some refcount is updated, you update a
> max_refcounted_cluster variable in the qcow2 state), but that means
> code.

That actually sounds simple enough. I can give it a try to see if it's
worth doing, because I'd rather not do anything that needs too many
changes.

>> (*)I actually noticed (I'm talking about a qcow2 image stored in RAM
>> now) that disabling the refcount-block check increases dramatically
>> (+90%) the number of IOPS when using virtio-blk, but doesn't seem to
>> have any effect (my tests even show a slightly negative effect!!) when
>> using virtio-scsi. Does that make sense? Am I hitting a SCSI limit or
>> what would be the reason for this?
>
> That's for someone else to answer. O:-)

I'll see what I can find.

Berto



reply via email to

[Prev in Thread] Current Thread [Next in Thread]