qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] device assignment for embedded Power


From: Alexander Graf
Subject: Re: [Qemu-devel] device assignment for embedded Power
Date: Fri, 1 Jul 2011 13:40:22 +0200

On 01.07.2011, at 02:58, Benjamin Herrenschmidt wrote:

> On Thu, 2011-06-30 at 15:59 +0000, Yoder Stuart-B08248 wrote:
>> One feature we need for QEMU/KVM on embedded Power Architecture is the 
>> ability to do passthru assignment of SoC I/O devices and memory.  An 
>> important use case in embedded is creating static partitions-- 
>> taking physical memory and I/O devices (non-PCI) and partitioning
>> them between the host Linux and several virtual machines.   Things like
>> live migration would not be needed or supported in these types of scenarios.
>> 
>> SoC devices do not sit on a probeable bus and there are no identifiers 
>> like 01:00.0 with PCI that we can use to identify devices--  the host
>> Linux kernel is made aware of SoC I/O devices from nodes/properties in a 
>> device tree structure passed at boot.   QEMU needs to generate a
>> device tree to pass to the guest as well with all the guest's virtual
>> and physical resources.  Today a number of mostly complete guest device
>> trees are kept under ./pc-bios in QEMU, but this too static and
>> inflexible.
>> 
>> Some new mechanism is needed to assign SoC devices to guests, and we
>> (FSL + Alex Graf) have been discussing a few possible approaches
>> for doing this from QEMU and would like some feedback.
>> 
>> Some possibilities:
>> 
>> 1. Option 1.  Pass the host dev tree to QEMU and assign devices
>>   by device tree path
>> 
>>     -dtb ./mpc8572ds.dtb -device assigned-soc-dev,dev=/soc/address@hidden
>> 
>>   /soc/address@hidden is the device tree path to the assigned device.
>>   The device node 'address@hidden' has some number of properties (e.g. 
>>   address, interrupt info) and possibly subnodes under
>>   it.   QEMU copies that node when generating the guest dev tree.
>>   See snippet of entire node:  http://paste2.org/p/1496460
> 
> Yuck (see below)
> 
>> 2. Option 2.  Pass the entire assigned device node as a string to
>>   QEMU
>> 
>>     -device assigned-soc-dev,dev=/address@hidden,dev-node='#address-cells = 
>> <1>;
>>      #size-cells = <0>; cell-index = <0>; compatible = "fsl-i2c";
>>      reg = <0xffe03000 0x100>; interrupts = <43 2>;
>>      interrupt-parent = <&mpic>; dfsrr;'
> 
> Beuark ! (see below)
> 
>>   This avoids needing to pass the host device tree, but could 
>>   get awkward-- the i2c example above is very simple, some device
>>   nodes are very large with a complex hierarchy of subnodes and 
>>   could be hundreds of lines of text to represent a single
>>   node.
>> 
>> It gets more complicated...
> 
> 
> So, from a qemu command line perspective, all you should have to do is
> pass qemu the device-tree -path- to the device you want to pass-trough
> (you may support passing a full hierarchy here).
> 
> That is for normal MMIO mapped SoC devices. Something else (individual
> i2c, usb, ...) will use specific virtualization of the corresponding
> busses.
> 
> Anything else sucks too much really.
> 
> From there, well, there's several approach inside qemu/kvm to handle
> that path. If you want to do things at the qemu level you can probably
> parse /proc/device-tree. But I'd personally just make it a kernel thing.
> 
> IE. I would have an ioctl to "instanciate" a pass-through device, that
> takes that path as an argument. I would make it return an anonymous fd
> which you can then use to mmap the resources, etc...

Yeah, one idea was to use VFIO here. We could for example modify the host 
device tree to occupy device we want to pass through with a specific 
compatibility parameter. Or we could try to steal the node during runtime. But 
I agree, reading the device tree data from a VFIO node sounds reasonable. If 
it's required.

> 
>> In some cases, modifications to device tree nodes may be needed.
>> An example-- sometimes a device tree property references another node 
>> and that relationship may not exist when assigned to a guest.
>> A "phy-handle" property may need to be deleted and a "fixed-link"
>> property added to a node representing a network device.
> 
> That's fishy. Why wouldn't you give full access to the MDIO ? It's
> shared ? Such things are so device-specific that they would have to be
> handled by device-specific quirks, which can live either in qemu or in
> the kernel.

Hrm, so you'd create a separate device for MDIO which can do pass-through of 
those?

> 
>> So in addition to assigning a device, a mechanism is needed to update 
>> device tree nodes.  So for the above example, maybe--
>> 
>> -device assigned-soc-dev,dev=/soc/address@hidden,delete-prop=phy-handle,
>>  node-update="fixed-link = <2 1 1000 0 0>"
> 
> That's just so gross and error prone, borderline insane.

Alternatives:

  * not modify the device tree (unlikely to work)
  * pass a full device tree chunk to qemu instead of modification commands
  * ?

> 
>> The types of modifications needed--  deleting nodes, deleting properties, 
>> adding nodes, adding properties, adding properties that reference other
>> nodes, changing properties. This device tree transformation mechanism
>> needed is general enough that it could apply to any device tree based
>> embedded platform (e.g. ARM, MIPS)
>> 
>> Another complexity relates to the IOMMU.  Here things get very company 
>> and IOMMU specific. Freescale has a proprietary IOMMU.
> 
> Look at the work currently being done for a generic qemu iommu layer. We
> need it for server power as well and from what I last saw coming from
> Eduardo and David, it's not PCI specific.

Well, but it only implements an IOMMU emulation layer inside the guest. That 
doesn't help us for the host side of things unfortunately :).


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]