qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] blobstore disk format (was Re: Design of the blobstore)


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] blobstore disk format (was Re: Design of the blobstore)
Date: Mon, 19 Sep 2011 22:04:13 +0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Sep 19, 2011 at 12:22:02PM -0400, Stefan Berger wrote:
> On 09/17/2011 03:28 PM, Michael S. Tsirkin wrote:
> >On Fri, Sep 16, 2011 at 12:46:40PM -0400, Stefan Berger wrote:
> >>On 09/16/2011 10:44 AM, Michael S. Tsirkin wrote:
> >>>On Thu, Sep 15, 2011 at 10:33:13AM -0400, Stefan Berger wrote:
> >>>>On 09/15/2011 08:28 AM, Michael S. Tsirkin wrote:
> >>>>>So the below is a proposal for a directory scheme
> >>>>>for storing (optionally multiple) nvram images,
> >>>>>along with any metadata.
> >>>>>Data is encoded using BER:
> >>>>>http://en.wikipedia.org/wiki/Basic_Encoding_Rules
> >>>>>Specifically, we mostly use the subsets.
> >>>>>
> >>>>Would it change anything if we were to think of the NVRAM image as
> >>>>another piece of metadata?
> >>>Yes, we can do that, sure. I had the feeling that it will help to lay
> >>>out the image at the end, to make directory listing
> >>>more efficient - the rest of metadata is usually small,
> >>>image might be somewhat large.
> >>>
> >>Why not let a convenience library handle the metadata on the device
> >>level, having it create the blob that the NVRAM layer ends up
> >>writing and parsing before the device uses it? Otherwise I should
> >>maybe rename the nvram to meatdata_store :-/
> >Maybe we are talking about different things. All I agrue for
> >is using a common standard format for storing metadata,
> >instead of having each device roll its own.
> That's fine. The TPM code inside libtpms serializes all internal
> data structures for later resumption. It doesn't use ASN.1 but in
> effect endianess-normalizes them and stores them in a packed format
> that can later be resumed along with a version tag prepended where
> need . Are you suggesting to change that ? I hope not...

I'm guessing that it makes sense to use whatever libtpms does
as is - if we change the format we'd have to parse it ourselves.
We can use it as a binary blob.
It's only when we add our own versioning/structure
on top of it, such as adding actual size, where
using a standard format makes sense IMO.

> >>>>I am also wondering whether each device shouldn't just handle the
> >>>>metadata itself,
> >>>It could be that just means we will have custom code with
> >>>different bugs in each device.
> >>>Note that from experience with formats, the problem with
> >>>time becomes less trivial than it seems as we
> >>>need to provide forward and backward compatibility
> >>>guarantees.
> >>>
> >>Is that guaranteed just by using ASN.1 ?
> >At least for BER, yes. We can always skip an optional field
> >that we don't recognize without knowing anything about
> >its internal format.
> 
> >>Do we need to add a
> >>revision to the metadata?
> >IMO, no. Instead we add optional attributes as long as we can
> >preserve backwards compatibility, and madatory attributes
> >if we can't.
> >
> Are devices doing this right now or are these future changes to
> devices' code?

The don't do right now.
AFAIK Anthony's working on changing devices to use a visitor
for migration, and I want to add a visitor backend
that will use BER format.

> >>>>encryption, integrity value (crc32 or sha1) and so on. What
> >>>>metadata should there be that really need to be handled on the NVRAM
> >>>>API and below level rather than on the device-specific code level?
> >>>So checksum  (checksum value and type) 'and so on' are what I call
> >>>metadata :) Doing it at device level seems wrong.
> >>>
> >>You mean doing it at the NVRAM level seems wrong. Of course, again
> >>something a device could write into a header prepended to the actual
> >>blob. Maybe every device that needs it should do that so that if we
> >>were to support encryption of blobs and the key for decryption was
> >>wrong one could detect it early without feeding badly decrypted /
> >>corrupted state into the device and see what happens.
> >Do what? Checksum the data? Well, error detection is nice,
> >but it could be that people actually care about not losing
> >all of the data on nvram if qemu is killed.  I also wonder whether
> >invalidating all data because of a single bit corruption is a bug or a
> >feature.
> >
> The checksuming I think makes sense if encryption is being added so
> decryption and testing for proper key material remains an NVRAM
> operation rather than a device operation.

Not sure how this addresses the question of what to do
on checksum failure.

> >>>>>We use a directory as a SET in a CER format.
> >>>>>This allows generating directory online without scanning
> >>>>>the entries beforehand.
> >>>>>
> >>>>I guess it is the 'unknown' for me... but what is the advantage of
> >>>>using ASN1 for this rather than just writing out packed and
> >>>>endianess-normalized data structures (with revision value),
> >>>If you want an example of where this 'custom formats are easy
> >>>so let us write one' leads to in the end,
> >>>look no further than live migration code.
> >>>It's a mess of hacks that does not even work across
> >>>upstream qemu versions, leave alone across
> >>>downstreams (different linux distros).
> >>>
> >>So is ASN1 the answer or does one still need to add a revision tag
> >>to each blob putting in custom code for parsing the different
> >>revisions of data structures (I guess) that may be extended/changed
> >>over time?
> >>
> >>    Stefan
> >We don't need revisions. We can always parse a new structure
> >skipping optional attributes we don't recognize. In case we want to
> >break old qemu versions intentially, we can add
> >a mandatory attribute.
> So you said you had some code for the handling of ASN.1. Can sketch
> how the interaction of devices would work with mandatory and
> optional attributes along with an API?

So for now what I'm happy with is a low level BER parser API.
NVRAM would implement a scheme on top of that
and bdrv. I'll pack up what I have and post ASAP.

> I'd prefer to NOT have the
> attributes and values be a part of the NVRAM API itself but let a
> (mandatory) library handle the serialization and deserialization of
> these metadata when a device wants to write or read state
> respectively. But maybe I just want to keep the NVRAM API 'too
> simple'.
> 
>    Stefan

I'm fine with that. Still tpm has some device specific
things like 'actual size' so the api will need to address that
in some generic way.

> >>>>having
> >>>>them crc32-protected to have some sanity checking in place?
> >>>>
> >>>>     Stefan
> >>>I'm not sure why we want crc specifically in TPM.
> >>>If it is 'just because we can' then it probably
> >>>applies to other non-volatile storage?
> >>>Storage generally?
> >>>
> >>>>>The rest of the encoding uses a DER format.
> >>>>>This makes for fast parsing as entries are easy to skip.
> >>>>>
> >>>>>Each entry is encoded in DER format.
> >>>>>Each entry is a SEQUENCE with two objects:
> >>>>>1. nvram
> >>>>>2. optional name - a UTF8String
> >>>>>
> >>>>>Binary data is stored as OCTET-STRING values on disk.
> >>>>>Any RW metadata is stored as OCTET-STRING value as well.
> >>>>>Any RO metadata is stored in appropriate universal encoding,
> >>>>>by type.
> >>>>>
> >>>>>On the context below, an attribute is either a IA5String or a SEQUENCE.
> >>>>>If IA5String, this is the attribute name, and it has no value.
> >>>>>If SEQUENCE, the first entry in the sequence is an
> >>>>>IA5String, it is the attribute name. The rest of the entries
> >>>>>represent the attribute value.
> >>>>>
> >>>>>Mandatory/optional attributes: depends on type.
> >>>>>tpm will have realsize as RW mandatory attribute.
> >>>>>
> >>>>>Each nvram is built as a SEQUENCE including 4 objects
> >>>>>1. type - an IA5String. downstreams can use other types such as
> >>>>>                      UUIDs instead to ensure no conflicts with upstream
> >>>>>2. SET of mandatory attributes
> >>>>>3. SET of optional attributes
> >>>>>4. data - a RW OCTET-STRING
> >>>>>
> >>>>>It is envisioned that attributes won't be too large,
> >>>>>so they can easily be kept in memory.
> >>>>>
> >>>>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]