qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format
Date: Fri, 10 Sep 2010 08:14:40 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100713 Lightning/1.0b1 Thunderbird/3.0.6

On 09/10/2010 06:14 AM, Avi Kivity wrote:

The point of an image format is not to recreate btrfs in software. It's to provide a mechanism to allow users to move images around reasonable but once an image is present on a reasonable filesystem, we should more or less get the heck out of the way.

You can achieve exactly the same thing with qcow2. Yes, it's more work, but it's also less disruptive to users.

This is turning dangerously close into a vbus vs. virtio discussion :-)

Let me review the motivation for QED and why we've decided incremental improvements to qcow2 were not viable.

1) qcow2 has awful performance characteristics

2) qcow2 has historically had data integrity issues. It's unclear anyone is willing to say that they're 100% confident that there are still data integrity issues in the format.

3) The users I care most about are absolutely uncompromising about data integrity. There is no room for uncertainty or trade offs when you're building an enterprise product.

4) We have looked at trying to fix qcow2. It appears to be a monumental amount of work that starts with a rewrite where it's unclear if we can even keep supporting all of the special features. IOW, there is likely to be a need for users to experience some type of image conversion or optimization process.

5) A correct version of qcow2 has terrible performance. You need to do a bunch of fancy tricks to recover that performance. Every fancy trick needs to be carefully evaluated with respect to correctness. There's a large surface area for potential data corruptors.

We're still collecting performance data, but here's an example of what we're talking about.

FFSB Random Writes MB/s (Block Size=8KB)

                        Native        Raw         QCow2     QED
1 Thread           30.2           24.4         22.7           23.4
8 Threads        145.1         119.9        10.6          112.9
16 Threads      177.1         139.0        10.1          120.9

The performance difference is an order of magnitude. qcow2 bounces all requests, needs to issue synchronous metadata updates, and only supports a single outstanding request at a time.

With good performance and high confidence in integrity, it's a no brainer as far as I'm concerned. We have a format that it easy to rationalize as correct, performs damn close to raw. On the other hand, we have a format that no one is confident that is correct that is even harder to rationalize as correct, and is an order of magnitude off raw in performance.

It's really a no brainer.

The impact to users is minimal. Upgrading images to a new format is not a big deal. This isn't guest visible and we're not talking about deleting qcow2 and removing support for it.

Today, users have to choose between performance and reliability or features. QED offers an opportunity to be able to tell users to just always use QED as an image format and forget about raw/qcow2/everything else.

raw will always be needed for direct volume access and shared storage. qcow2 will always be needed for old images.

My point is that for the future, the majority of people no longer have to think about "do I need performance more than I need sparse images?".

If they have some special use case, fine, but for most people we simplify their choices.

You can say, let's just make qcow2 better, but we've been trying that for years and we have an existence proof that we can do it in a straight forward fashion with QED.

When you don't use the extra qcow2 features, it has the same performance characteristics as qed.

If you're willing to leak blocks on a scale that is still unknown. It's not at all clear that making qcow2 have the same characteristics as qed is an easy problem. qed is specifically designed to avoid synchronous metadata updates. qcow2 cannot achieve that.

You can *potentially* batch metadata updates by preallocating clusters, but what's the right amount to preallocate and is it really okay to leak blocks at that scale? It's a weak story either way. There's a burden of proof still required to establish that this would, indeed, address the performance concerns.

You need to batch allocation and freeing, but that's fairly straightforward.

Yes, qcow2 has a long and tortured history and qed is perfect. Starting from scratch is always easier and more fun. Except for the users.

The fact that you're basing your argument on "think of the users" is strange because you're advocating not doing something that is going to be hugely beneficial for our users.

You're really arguing that we should continue only offering a format with weak data integrity and even weaker performance.

A new format doesn't introduce much additional complexity. We provide image conversion tool and we can almost certainly provide an in-place conversion tool that makes the process very fast.

It introduces a lot of complexity for the users who aren't qed experts. They need to make a decision. What's the impact of the change? Are the features that we lose important to us? Do we know what they are? Is there any risk? Can we make the change online or do we have to schedule downtime? Do all our hosts support qed?

It's very simple. Use qed, convert all existing images. Image conversion is a part of virtualization. We have tools to do it. If they want to stick with qcow2 and are happy with it, fine, no one is advocating removing it.

We can solve all possible problems and have images that users can move back to arbitrarily old versions of qemu with all of the same advantages of the newer versions. It's not realistic.

Improving qcow2 will be very complicated for Kevin who already looks older beyond his years [1] but very simple for users.

I think we're all better off if we move past sunk costs and focus on solving other problems. I'd rather we all focus on improving performance and correctness even further than trying to make qcow2 be as good as what every other hypervisor had 5 years ago.

qcow2 has been a failure. Let's live up to it and move on. Making statements at each release that qcow2 has issues but we'll fix it soon just makes us look like we don't know what we're doing.

User confusion is reduced if we can make strong, clear statements: all users should use QED even if they care about performance. Today, there's mass confusion because of the poor state of qcow2.

If we improve qcow2 and make the same strong, clear statement we'll have the same results.

To be honest, the brand is tarnished. Once something gains a reputation for having poor integrity, it's very hard to overcome that.

Even if you have Kevin spend the next 6 months rewriting qcow2 from scratch, I'm going to have a hard time convincing customers trust it.

All someone has to do is look at change logs to see that it has a bad history. That's more than enough to make people very nervous.

Virtualization is about compatibility. In-guest compatibility first, but keeping the external environment stable is also important. We really need to exhaust the possibilities with qcow2 before giving up on it.

IMHO, we're long past exhausting the possibilities with qcow2. We still haven't decided what we're going to do for 0.13.0.

Sorry, I disagree 100%. How can you say that, when no one has yet tried, for example, batching allocations and frees? Or properly threaded it?

We've spent years trying to address problems in qcow2. And Stefan specifically has spent a good amount of time trying to fix qcow2. I know you've spent time trying to thread it too. I don't think you really grasp how difficult of a problem it is to fix qcow2. It's not just that the code is bad, the format makes something that should be simple more complicated than it needs to be.

qcow2 is not a properly designed image format. It was a weekend hacking session from Fabrice that he dropped in the code base and never really finished doing what he originally intended. The improvements that have been made to it are almost at the heroic level but we're only hurting our users by not moving on to something better.



I don't like qcow2 either. But from a performance perspective, it can be made equivalent to qed with some effort. It is worthwhile to expend that effort rather than push the burden to users.

The choices we have 1) provide our users a format that has high performance and good data integrity 2) continue to only offer a format that has poor performance and bad data integrity and promise that we'll eventually fix it.

We've been doing (2) for too long now. We need to offer a solution to users today. It's not fair to our users to not offer them a good solution just because we don't want to admit to previous mistakes.

If someone can fix qcow2 and make it competitive, by all means, please do.

Regards,

Anthony Liguori

Regards,

Anthony Liguori



[1] okay, maybe not.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]