qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] New device API


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] New device API
Date: Fri, 08 May 2009 08:47:12 -0500
User-agent: Thunderbird 2.0.0.21 (X11/20090320)

Zachary Amsden wrote:
I think the general direction is good, and this is sorely needed, but I
think having a fixed / static device struct isn't flexible enough to
handle the complexities of multiple device / bus types - for example,
MSI-X could add tons of IRQ vectors to a device, or complex devices
could have tons of MMIO regions.

I think a more flexible scheme would be to have a common device header,
and fixed substructures which are added as needed to a device.

For example, showing exploded embedded types / initialized properties.
This isn't intended to be a realistic device, or even real C code.

struct RocketController
{
    struct Device {
        struct DeviceType *base_class;
        const char *instance_name;
        DeviceProperty *configuration_data;
        struct DeviceExtension *next;
    } dev;
    struct PCIExtension {
        struct DeviceExtension header {
            int type = DEV_EXT_PCI;
            struct DeviceExtension *next;
        }
        PCIRegs regs;
        PCIFunction *registered_functions;
        struct PCIExtension *pci_next;
    } pci;
    struct MMIOExtension {
        struct DeviceExtension header {
            int type = DEV_EXT_MMIO;
            struct DeviceExtension *next;
        }
        target_phys_addr_t addr;
        target_phys_addr_t size;
        mmio_mapfunc cb;
    } mmio;
    struct IRQExtension {
        struct DeviceExtension header {
            int type = DEV_EXT_IRQ;
            struct DeviceExtension *next;
        }
        int irq;
        int type = DEV_INTR_LEVEL | DEV_INTR_MSI;
    } irqs [ROCKET_CONTROLLER_MSIX_IRQS+1];

I think the problem with this is that it gives too much information about CPU constructs (MMIO/IRQs) to a device that is never connected to the actual CPU.

PCI devices do not have a concept of "MMIO". They have a concept of IO regions. You can have different types of IO regions and the word sizes that are supported depend on the width of the PCI bus. Additionally, the rules about endianness of the data totally depend on the PCI controller, not the CPU itself.

This is where the current API fails miserably. You cannot have a PCI device calling the CPU MMIO registration functions directly because there is no sane way to deal with endianness conversion. Instead, the IO region registration has to go through the PCI bus.

Likewise, for IRQs, we should stick to the same principle. PCI exposes MSI and LNK interrupts to devices. We should have an API for devices to consume at the PCI level for that.

As Paul said, I don't think it's worth trying to make the same devices work on top of multiple busses. I think it's asking for trouble. Instead, for devices that can connect to multiple busses (like ne2k), they can have separate register_pci_device() and register_isa_device() calls and then just internally abstract their chipset functions. Then it just requires a small big of ISA and PCI glue code to support both device types.

1) Static per-device configuration choices, such as feature enable /
disable, framebuffer size, PCI vs. ISA bus representation.  These are
fixed choices over the lifetime on the VM.  In the above, they would be
represented as DeviceProperty configuration strings.

Yes. I think this is important too. But when we introduce this, we need to make sure the devices pre-register what strings they support and provide human consumable descriptions of what those knobs do. Basically, we should be able to auto extract a hardware documentation file from the device that describes in detail all of the supported knobs.

2) Dynamic per-device data.  Things such as PCI bus numbers, IRQs or
MMIO addresses.  These are dynamic and change, but have fixed, defined
layouts.  They may have to be stored in an intermediate representation
(a configuration file or wire protocol) for suspend / resume or
migration.  In the above, it would be possible to write handlers for
suspend / resume that operate on a generic device representation,
knowing to call specific handlers for PCI, etc.  Would certainly
mitigate a lot of bug potential in the dangerous intersection of ad-hoc
device growth and complex migration / suspend code.

For the most part, I think the device should be unaware of these things. It never needs to see it's devfn. It should preregister what lnks it supports and whether it supports MSI, but it should never know what IRQs that actually gets routed to.

3) Implementation specific data.  Pieces of data which are required for
a particular realization of a device on the host, such as file
descriptors, call back functions, authentication data, network routing.
 This data is not visible at this layer, nor do I think it should be.
Since devices may move, and virtual machines may migrate, possibly
across widely differing platforms, there could be an entirely different
realization of a device (OSS vs. ALSA sound as a trivial example).  You
may even want to dynamically change these on a running VM (SDL to VNC is
a good example).

How to represent #3 is not entirely clear, nor is it clear how to parse

We really have three types of things and it's not entirely clear to me how to name them. What we're currently calling devices are emulated hardware. Additionally, we have host drivers that provide backend functionality. This would include things like SDL, VNC, the tap VLAN driver. We also need something like a host device. This would be the front-end functionality that connects to the host driver backend.

A device registers its creation function in it's module_init(). This creation function will then register the fact that it's a PCI device and will basic information about itself in that registration. A PCI device can be instantiated via a device configuration file and when that happens, the device will create a host device for whatever functionality it supports. For a NIC, this would be a NetworkHostDevice or something like that. This NetworkHostDevice would have some link to the device itself (via an id, handle, whatever). A user can then create a NetworkHostDriver and attach the NetworkHostDevice to that driver and you then have a functional emulated NIC.

Regards,

Anthony Liguori





reply via email to

[Prev in Thread] Current Thread [Next in Thread]