qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: Strategic decision: COW format


From: Kevin Wolf
Subject: Re: [Qemu-devel] Re: Strategic decision: COW format
Date: Wed, 23 Feb 2011 15:55:12 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10

Am 23.02.2011 15:21, schrieb Anthony Liguori:
> On 02/23/2011 03:13 AM, Kevin Wolf wrote:
>> Am 22.02.2011 19:18, schrieb Anthony Liguori:
>>    
>>> On 02/22/2011 10:15 AM, Kevin Wolf wrote:
>>>      
>>>> Am 22.02.2011 16:57, schrieb Anthony Liguori:
>>>>
>>>>        
>>>>> On 02/22/2011 02:56 AM, Kevin Wolf wrote:
>>>>>
>>>>>          
>>>>>> *sigh*
>>>>>>
>>>>>> It starts to get annoying, but if you really insist, I can repeat it
>>>>>> once more: These features that you don't need (this is the correct
>>>>>> description for what you call "misfeatures") _are_ implemented in a way
>>>>>> that they don't impact the "normal" case.
>>>>>>
>>>>>>            
>>>>> Except that they require a refcount table that adds additional metadata
>>>>> that needs to be updated in the fast path.  I consider that impacting
>>>>> the normal case.
>>>>>
>>>>>          
>>>> Like it or not, this requirement exists anyway, without any of your
>>>> "misfeatures".
>>>>
>>>> You chose to use the dirty flag in QED in order to avoid having to flush
>>>> metadata too often, which is an approach that any other format, even one
>>>> using refcounts, can take as well.
>>>>
>>>>        
>>> It's a minor detail, but flushing and the amount of metadata are
>>> separate points.
>>>      
>> I agree that they are separate...
>>
>>    
>>> The dirty flag prevents metadata from being flushed to disk very often
>>> but the use of a refcount table adds additional metadata.
>>>
>>> A refcount table is definitely not required even if you claim the
>>> requirement exists for other features.  I assume you mean to implement
>>> trim/discard support but instead of a refcount table, a free list would
>>> work just as well and would leave the metadata update out of the fast
>>> path (allocating writes) and instead only be in the slow path
>>> (trim/discard).
>>>      
>> ...but here you're arguing about writing metadata out in the fast path,
>> so you're actually not interested in the amount of metadata but in the
>> overhead of flushing it. Which is a problem that's solved.
>>    
> 
> I'm interested in both.  An extra write is always going to be an extra 
> write.  The flush just makes it very painful.

One extra write of 64k every 2 GB. Hardly relevant.

>> A refcount table is essential for internal snapshots and compression,
>> it's useful for discard and for running on block devices, it's necessary
>> for avoiding the dirty flag and fsck on startup.
>>    
> 
> No, as designed today, qcow2 still needs a dirty flag to avoid leaking 
> blocks.

I know that this is your opinion and I do respect that, this is one of
the reasons why there is the suggestion to add the dirty flag for you.

On the other hand, it would be about time for you to accept that there
are people who think differently about it and who don't want the same as
you. This is why using the dirty flag should be optional.

>> These are five use cases that I can enumerate without thinking a lot
>> about it, there might be more. You propose using three different
>> mechanisms for allowing normal allocations (use the file size), block
>> devices (add a size field into the header) and discard (free list), and
>> the other three features, for which you can't think of a hack, you
>> declare "misfeatures".
>>    
> 
> No, I only label compression and internal snapshots as misfeatures.  
> Encryption is a completely reasonable feature.

I didn't even mention encryption. It's obvious that it's a "reasonable
feature" and not a "misfeature", because it fits relatively easily in
your QED design. :-)

The three features you don't like because they don't fit are
compression, internal snapshots and not having to fsck (thanks for
proving the latter above)

> So even with qcow3, what's the expectation of snapshots?  Are we going 
> to scale to images with over 1000 snapshots?  I believe snapshot support 
> in qcow2 is not a feature that has been designed with any serious 
> thought.  If we truly want to support internal snapshots, let's design 
> it correctly.

So what would be the key differences between your design and qcow2's? We
can always check if there's room to improve.

>>> As a format feature, a refcount table really only makes sense if the
>>> refcount is required to be greater than a single bit.  There are more
>>> optimal data structures that can be used if the refcount of a block is
>>> fixed to 1-bit (like a free list) which is what the fundamental design
>>> difference between qcow2 and qed is.
>>>      
>> Okay, so even assuming that there's something like misfeatures that we
>> can kick out (with which I strongly disagree), what's the crucial
>> advantage of free lists that would make you switch the image format?
> 
> Performance.  One thing we haven't tested with qcow2 is O_SYNC 
> performance in the guest but my suspicion is that an O_SYNC workload is 
> going to perform poorly even with cache=none.

But wasn't it you who wants to use the dirty flag in any case? The
refcounts aren't even written then.

> Starting with a simple format that we don't have to jump through 
> tremendous hoops to get reasonable performance out of has a lot of virtues.

I know that you don't mean it like I read this, but it's entirely true:
You're _starting_ with a simple format, but once you add features you're
going to get something much more complex than qcow2 because you just
don't have proper cluster allocation infrastructure and need to invent
new hacks every time.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]