qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: RFC v2: blockdev_add & friends, brief rationale, Q


From: Kevin Wolf
Subject: Re: [Qemu-devel] Re: RFC v2: blockdev_add & friends, brief rationale, QMP docs
Date: Fri, 18 Jun 2010 11:36:50 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-2.fc12 Thunderbird/3.0.4

Am 18.06.2010 10:20, schrieb Markus Armbruster:
>>>>> -blockdev should be optimized for config files, not single
>>>>> argument input.  IOW:
>>>>>
>>>>> [blockdev "blk2"]
>>>>>    format = "raw"
>>>>>    file = "/path/to/base.img"
>>>>>    cache = "writeback"
>>>>>
>>>>> [blockdev "blk1"]
>>>>>     format = "qcow2"
>>>>>     file = "/path/to/leaf.img"
>>>>>     cache="off"
>>>>>     backing_dev = "blk2"
>>>>>
>>>>> [device "disk1"]
>>>>>     driver = "ide-drive"
>>>>>     blockdev = "blk1"
>>>>>     bus = "0"
>>>>>     unit = "0"
>>>>>      
>>>> You don't specify the backing file of an image on the command line (or
>>>> in the configuration file).
>>>
>>> But we really ought to allow it.  Backing files are implemented as part 
>>> of the core block layer, not the actual block formats.  
>>
>> The generic block layer knows the name of the backing file, so it can be
>> displayed in tools, but that's about it. Calling this the
>> "implementation" of backing files is daring.
> 
> You're both partly wrong :)

Yes, it was not entirely correct. What I was thinking of (and it is
probably what I should have written) is that the real COW implementation
is part of the driver and not generic infrastructure. If the format
driver doesn't do COW, there's no use in specifying a backing file to
block.c.

> COW format drivers retrieve the "name" of the backing file from the
> image on open, and store it in a well-known place in their
> BlockDriverState.  qcow2 also retrieves the format.
> 
> Then generic block code opens the backing "file", and stores a pointer
> to the backing BlockDriverState in the COW format's BlockDriverState.
> 
> The COW format driver then uses the backing "file" BlockDriverState
> however it sees fit.
> 
> Generic code takes care of closing it, and also has a hand in commit and
> encryption.
> 
> Aside: I put "name" and "file" in quotes, because the name is really an
> encoding of protocol + arguments.  It works just like -drive with
> file=NAME (and format=FORMAT if qcow2).  But for the COW format driver,
> it's just a string (or two) it uses to get access to a backing image.
> It's fine with any format + protocol combination, so it leaves that to
> generic code.
> 
> Aside^2: We can define the semantics of QCOW2 however we please.  But is
> it really sane to interpret a backing file name "fat:duck" we find in
> some VMDK image as "vvfat over directory duck"?

Good point, though I don't think it's a real problem. But it's true that
we use VMDK in a very qemu-specific way. We can read VMDKs that VMware
can't read (with a qcow2 backing file, for example), but in the same way
it may happen that we interpret a file name wrong if it comes from
VMware rather than qemu. Disabling protocol parsing here would disable
additional features that were possible with qemu - but to be honest, I
don't think that either option is a real problem.

>> I see no use case for specifying it on the command line. The only thing
>> you can achieve better with it is corrupting your image because you
>> specify the wrong/no backing file next time.
> 
> Anthony has a point on the conceptual level: COW is a stacking of
> BlockDriverStates.

It is. But it's not the same stacking as the format/protocol one, it's a
second dimension. I doubt that it makes sense to let the user specify
this stacking on the command line.

> But Kevin is right on usability.  We could permit overriding of backing
> file on the command line / config file / QMP, but it would be a
> dangerous expert feature, and no replacement for storing references to
> backing files in the COW images.

I even refuse to call it a feature as long as you can't tell me a use
case. I think it's useless - there are cases that are already possible
with having the backing file name stored in the image file and you just
duplicate it on the command line; and the other cases don't even work.

>>> Today the block 
>>> layer queries the block format for the name of the backing file but gets 
>>> no additional options from the block format.  File isn't necessarily 
>>> enough information to successfully open the backing device so why treat 
>>> it specially?
> 
> Parameters for opening a block device are flags, format, filename (which
> is really an encoding of protocol + arguments).  So the only piece of
> information that's missing for backing files is flags.
> 
> Flags encode -drive's parameters snapshot, cache, aio, readonly.  For
> backing files, we force readonly on and snapshot off.  We inherit cache
> and aio settings from the backed file.  Which of these do you want to
> control separately?
> 
>>> I think we should keep the current ability to query the block format for 
>>> a backing file name but we should also support hooking up the backing 
>>> device without querying the block format at all.  It makes the model 
>>> much more elegant IMHO because then we're just creating block devices 
>>> and hooking them up.  All block devices are created equal more or less.
>>>
>>>>   It's saved as part of the image. It's more
>>>> like this (for a simple raw image file):
>>>>
>>>> [blockdev-protocol "proto1"]
>>>>     protocol = "file"
>>>>     file = "/path/to/image.img"
>>>>
>>>> [blockdev "blk1"]
>>>>     format = "raw"
>>>>     cache="off"
>>>>     protocol = "proto1"
>>>>
>>>> [device "disk1"]
>>>>     driver = "ide-drive"
>>>>     blockdev = "blk1"
>>>>     bus = "0"
>>>>     unit = "0"
>>>>
>>>> (This would be Markus' option 3, I think)
>>>>    
>>>
>>> I don't understand why we need two layers of abstraction here.  Why not 
>>> just:
>>>
>>> [blockdev "proto1"]
>>>    protocol = "file"
>>>    cache = "off"
>>>    file = "/path/to/image.img"
>>>
>>> Why does the cache option belong with raw and not with file and why 
>>> can't we just use file directly?
>>
>> The cache option is shared along the chain, so it probably fits best in
>> the blockdev.
>>
>> And we don't use file directly because it's wrong. Users say that their
>> image is in raw format, and they don't get why they should have to make
>> a difference between a raw image stored on a block device and one stored
>> in a file.
>>
>>> As Christoph mentions, we really don't 
>>> have stacked protocols and I'm
> 
> "missing the end of my sentence"?
> 
> What we *really* don't have is a common understanding of "format" and
> "protocol"!
> 
>> The only question is if we call them stacked formats or stacked
>> protocols. One of them exists.
> 
> It does internally.  But until we develop a stable understanding of
> "format", "protocol" and how they stack, we better avoid exposing the
> stacking at external interfaces.

Right, we better develop a stable understand first. But that doesn't
mean that the problem is solved yet just because it's deferred.

>>>>> not sure they make sense.
>>>>>      
>>>> Right, if we go for Christoph's suggestion, we don't need stacked
>>>> protocols. We'll have stacked formats instead. I'm not sure if you like
>>>> this any better. ;-)
>>>>
>>>> We do have stacking today. -hda blkdebug:test.cfg:foo.qcow2 is qcow2 on
>>>> blkdebug on file. We need to be able to represent this.
>>>>    
>>>
>>> I think we need to do stacking in a device specific way.  When you look 
>>> at something like vmdk, it should actually support multiple leafs since 
>>> the format does support such a thing.  So what I'd suggest is:
>>>
>>> [blockdev "part1"]
>>>    format = "raw"
>>>    file = "image000.vmdk"
>>>
>>> [blockdev "part2"]
>>>    format = "raw"
>>>    file = "image001.vmdk"
>>>
>>> [blockdev "image"]
>>>    format = "vmdk"
>>>    section0 = "part1"
>>>    section1 = "part2"
>>
>> Actually, I'd prefer to read that information from the VMDK file instead
>> of requiring the user to configure this manually...
>>
>>> Note, we'll need to support this sort of model in order to support a 
>>> disk that creates an automatic partition table (which would be a pretty 
>>> useful feature). 
>>
>> Sounds like a good example of a useful protocol.
>>
>> Markus, I'm afraid we've found an equivalent to Avi's mirror. If not
>> even more complicated, because we'd need to accept any length for the
>> list of partitions - possibly an option that should take an array?
> 
> I wouldn't say it's more complicated than Avi's mirror.  The
> zero-one-infinity rule applies again: zero inferior protocols is
> simplest (no stacking), then up to one inferior protocol (protocol
> stack), then any number above one (protocol tree).

Anyway, it's more than purely theoretical, it's an example that could be
useful in practice. So when we decide the final structure (after having
developed a clear idea about formats and protocols) we should definitely
consider this.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]