qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]


From: Stefan Berger
Subject: Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]
Date: Thu, 15 Sep 2011 08:34:55 -0400
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Fedora/3.1.11-1.fc14 Lightning/1.0b3pre Thunderbird/3.1.11

On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
<address@hidden>  wrote:
  One property of the blobstore is that it has a certain required size for
accommodating all blobs of device that want to store their blobs onto. The
assumption is that the size of these blobs is know a-priori to the writer of
the device code and all devices can register their space requirements with
the blobstore during device initialization. Then gathering all the
registered blobs' sizes plus knowing the overhead of the layout of the data
on the disk lets QEMU calculate the total required (minimum) size that the
image has to have to accommodate all blobs in a particular blobstore.
Libraries like tdb or gdbm come to mind.  We should be careful not to
reinvent cpio/tar or FAT :).
Sure. As long as these dbs allow to over-ride open(), close(), read(), write() and seek() with bdrv ops we could recycle any of these. Maybe we can build something smaller than those...
What about live migration?  If each VM has a LUN assigned on a SAN
then these qcow2 files add a new requirement for a shared file system.

Well, one can still block-migrate these. The user has to know of course whether shared storage is setup or not and pass the appropriate flags to libvirt for migration. I know it works (modulo some problems when using encrypted QCoW2) since I've been testing with it.

Perhaps it makes sense to include the blobstore in the VM state data
instead?  If you take that approach then the blobstore will get
snapshotted *into* the existing qcow2 images.  Then you don't need a
shared file system for migration to work.

It could be an option. However, if the user has a raw image for the VM we still need the NVRAM emulation for the TPM for example. So we need to store the persistent data somewhere but raw is not prepared for that. Even if snapshotting doesn't work at all we need to be able to persist devices' data.


Can you share your design for the actual QEMU API that the TPM code
will use to manipulate the blobstore?  Is it designed to work in the
event loop while QEMU is running, or is it for rare I/O on
startup/shutdown?

Everything is kind of changing now. But here's what I have right now:

    tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode);
    if (!tb->s.tpm_ltpms->nvram) {
        fprintf(stderr, "Could not find nvram.\n");
        return errcode;
    }

    nvram_register_blob(tb->s.tpm_ltpms->nvram,
                        NVRAM_ENTRY_PERMSTATE,
                        tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
    nvram_register_blob(tb->s.tpm_ltpms->nvram,
                        NVRAM_ENTRY_SAVESTATE,
                        tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
    nvram_register_blob(tb->s.tpm_ltpms->nvram,
                        NVRAM_ENTRY_VOLASTATE,
tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));

    rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);

Above first sets up the NVRAM using the drive's id. That is the -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM. Subsequently the blobs to be written into the NVRAM are registered. The nvram_start then reconciles the registered NVRAM blobs with those found on disk and if everything fits together the result is 'rc = 0' and the NVRAM is ready to go. Other devices can than do the same also with the same NVRAM or another NVRAM. (NVRAM now after renaming from blobstore).

Reading from NVRAM in case of the TPM is a rare event. It happens in the context of QEMU's main thread:

    if (nvram_read_data(tpm_ltpms->nvram,
                        NVRAM_ENTRY_PERMSTATE,
&tpm_ltpms->permanent_state.buffer,
&tpm_ltpms->permanent_state.size,
                        0, NULL, NULL) ||
        nvram_read_data(tpm_ltpms->nvram,
                        NVRAM_ENTRY_SAVESTATE,
&tpm_ltpms->save_state.buffer,
&tpm_ltpms->save_state.size,
                        0, NULL, NULL))
    {
        tpm_ltpms->had_fatal_error = true;
        return;
    }

Above reads the data of 2 blobs synchronously. This happens during startup.


Writes are depending on what the user does with the TPM. He can trigger lots of updates to persistent state if he performs certain operations, i.e., persisting keys inside the TPM.

    rc = nvram_write_data(tpm_ltpms->nvram,
                          what, tsb->buffer, tsb->size,
                          VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
                          NULL, NULL);

Above writes a TPM blob into the NVRAM. This is triggered by the TPM thread and notifies the QEMU main thread to write the blob into NVRAM. I do this synchronously at the moment not using the last two parameters for callback after completion but the two flags. The first is to notify the main thread the 2nd flag is to wait for the completion of the request (using a condition internally).

Here are the protos:

VNVRAM *nvram_setup(const char *drive_id, int *errcode);

int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);

int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
                        unsigned int maxsize);

unsigned int nvram_get_totalsize(VNVRAM *bs);
unsigned int nvram_get_totalsize_kb(VNVRAM *bs);

typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
                             unsigned char **data, unsigned int len);

int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
                     const unsigned char *data, unsigned int len,
                     int flags, NVRAMRWFinishCB cb, void *opaque);


As said, things are changing right now, so this is to give an impression...

  Stefan

Stefan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]