qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device


From: Bharat Bhushan
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Date: Wed, 12 Jul 2017 10:27:31 +0000


> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:address@hidden
> Sent: Wednesday, July 12, 2017 3:48 PM
> To: Bharat Bhushan <address@hidden>; Auger Eric
> <address@hidden>; address@hidden;
> address@hidden; address@hidden; address@hidden;
> address@hidden; address@hidden
> Cc: address@hidden; address@hidden; address@hidden;
> address@hidden; address@hidden; address@hidden;
> address@hidden; address@hidden
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 12/07/17 04:50, Bharat Bhushan wrote:
> [...]
> >> The size of the virtio_iommu_req_probe structure is variable, and
> depends
> >> what fields the device implements. So the device initially computes the
> size it
> >> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
> >> driver allocates that many bytes for virtio_iommu_req_probe.content[]
> >>
> >>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should
> send
> >> an
> >>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> >>>> * The driver allocates a device-writeable buffer of probe_size (plus
> >>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> >>>> * The device fills the buffer with various information.
> >>>>
> >>>> struct virtio_iommu_req_probe {
> >>>>  /* device-readable */
> >>>>  struct virtio_iommu_req_head head;
> >>>>  le32 device;
> >>>>  le32 flags;
> >>>>
> >>>>  /* maybe also le32 content_size, but it must be equal to
> >>>>     probe_size */
> >>>
> >>> Can you please describe why we need to pass size of "probe_size" in
> probe
> >> request?
> >>
> >> We don't. I don't think we should add this 'content_size' field unless 
> >> there
> is
> >> a compelling reason to do so.
> >>
> >>>>
> >>>>  /* device-writeable */
> >>>>  u8 content[];
> >>>
> >>> I assume content_size above is the size of array "content[]" and max
> value
> >> can be equal to probe_size advertised by device?
> >>
> >> probe_size is exactly the size of array content[]. The driver must 
> >> allocate a
> >> buffer of this size (plus the space needed for head, device, flags and 
> >> tail).
> >>
> >> Then the device is free to leave parts of content[] empty. Field 'type' 0 
> >> will
> be
> >> reserved and mark the end of the array.
> >>
> >>>>  struct virtio_iommu_req_tail tail;
> >>>> };
> >>>>
> >>>> I'm still struggling with the content and layout of the probe
> >>>> request, and would appreciate any feedback. To be easily extended, I
> >>>> think it should contain a list of fields of variable size:
> >>>>
> >>>>  |0           15|16           31|32               N|
> >>>>  |     type     |    length     |      values      |
> >>>>
> >>>> 'length' might be made optional if it can be deduced from type, but
> >>>> might make driver-side parsing more robust.
> >>>>
> >>>> The probe could either be done for each endpoint, or for each address
> >>>> space. I much prefer endpoint because it is the smallest granularity.
> >>>> The driver can then decide what endpoints to put together in the same
> >>>> address space based on their individual capabilities. The
> >>>> specification would described how each endpoint property is combined
> >>>> when endpoints are put in the same address space. For example, take
> >>>> the minimum of all PASID size, the maximum of all page granularities,
> >>>> combine doorbell addresses, etc.
> >>>>
> >>>> If we did the probe on address spaces instead, the driver would have
> >>>> to re-send a probe request each time a new endpoint is attached to an
> >>>> existing address space, to see if it is still capable of page table
> >>>> handover or if the driver just combined a VFIO and an emulated
> >>>> endpoint by accident.
> >>>>
> >>>>                                  ***
> >>>>
> >>>> Using this framework, the device can declare doorbell regions by
> >>>> adding one or more RESV fields into the probe buffer:
> >>>>
> >>>> /* 'type' */
> >>>> #define VIRTIO_IOMMU_PROBE_T_RESV        0x1
> >>>>
> >>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
> >>>> struct virtio_iommu_probe_resv {
> >>>>  le64 gpa;
> >>>>  le64 size;
> >>>>
> >>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI      0x1
> >>>>  u8 type;
> >
> > To be sure I am understanding it correctly, Is this "type" in struct
> virtio_iommu_req_head?
> 
> No, virtio_iommu_req_head::type is the request type
> (ATTACH/DETACH/MAP/UNMAP/PROBE).
> 
> Then virtio_iommu_probe_property::type is the property type (only RESV
> for
> the moment).
> 
> And this is virtio_iommu_probe_resv::type, which is the type of the resv
> region (MSI). I renamed it to 'subtype' below, but I think it still is
> pretty confusing.
> 
> 
> I did a number of changes to structures and naming when trying to
> integrate it to the specification:
> 
> * Added 64 bytes of padding in virtio_iommu_req_probe, so that future
> extensions can add fields in the device-readable part.
> * renamed "RESV" to "RESV_MEM".
> * The resv_mem property now looks like this:
>   struct virtio_iommu_probe_resv_mem {
>         u8      subtype;
>         u8      padding[3];
>         le32    flags;
>         le64    addr;
>         le64    size;
>   };
> * subtype for MSI doorbells is now
> VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS
> (because transactions to this region bypass the IOMMU). 'flags' contain a
> hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the driver that this
> region is used for MSIs.
> 
> Here is an example of a probe request returning an MSI doorbell property.
> 
>      31                       7      0
>     +---------------------------------+
>     |           0            |  type  | <- request type = PROBE (5)
>     +---------------------------------+
>     |             device              |
>     +---------------------------------+
>     :                                 :
>     :          (64B padding)          :
>     :                                 :
>     +---------------------------------+
>   ^ |  length = 24   |    type = 1    | <- property type = RESV_MEM (1)
>   | +---------------------------------+
>   | |           0            |subtype | <- RESV_MEM subtype = BYPASS (1)
>  p| +---------------------------------+
>  r| |           flags = MSI           |
>  o| +---------------------------------+
>  b| |         addr = 0xfee00000       |
>  e| |                                 |
>  _| +---------------------------------+
>  s| |         size = 0x00100000       |
>  i| |                                 |
>  z| +---------------------------------+
>  e| |    length      |      type      | <- another property may start
>   | :                                 :    here
>   v :               ...               :
>     +---------------------------------+
>     |           0            | status | <- request tail
>     +---------------------------------+

So we want a single probe will return info of all "types" and each "subtype" of 
given "type"? I was of impression that based on flags there will be separate 
probe request for a type.

Thanks
-Bharat

> 
> 
> I'll try to send the next version of the spec out as soon as possible.
> 
> Thanks,
> Jean
> 
> 
> > Thanks
> > -Bharat
> >
> >>>
> >>> type is 16 bit above?
> >>
> >> Ah, the naming isn't great. This is not the same as above, and could be
> called
> >> 'subtype' to avoid confusion. The above 16-bit type describes the field
> type,
> >> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it seems
> >> easy to reach more than 255 kinds of endpoint properties, but
> >> 65535 should do.
> >>
> >> This subtype describes which kind of resv region is described in the
> structure.
> >> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but we
> >> could for example add resv regions that the driver should never use or
> that it
> >> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in
> >> Linux). I think 8 bits should be enough to contain any future types, unless
> we
> >> make this a bitfield. For identity-map, there may be an additional flags
> field
> >> describing the protection.
> >>
> >>>> };
> >>>>
> >>>> Such a region would be subject to the following rules:
> >>>>
> >>>> * Driver should not use any IOVA declared as RESV_MSI in a mapping.
> >>>> * Device should leave any transaction matching a RESV_MSI region pass
> >>>> through untranslated.
> >>>> * If the device does not advertise any RESV region, then the driver
> >>>> should assume that MSI doorbells, like any other GPA, must be
> mapped
> >>>> with an arbitrary IOVA in order for the endpoint to access them.
> >>>> * Given that the driver *should* perform a probe request if
> >>>> available, and it *should* understand the
> >> VIRTIO_IOMMU_PROBE_T_RESV
> >>>> field, then this field tells the guest how it should handle MSI
> >>>> doorbells, and whether it should map the address via MAP requests or
> >> not.
> >>>>
> >>>> Does this make sense and did I overlook something?
> >>>
> >>> Overall it looks good to me. Do you have plans to implements this in
> virtio-
> >> iommu driver and kvmtool?
> >>
> >> Yes, if there is no objection I'll try to formalize it and implement it 
> >> right
> away.
> >>
> >> Thanks,
> >> Jean


reply via email to

[Prev in Thread] Current Thread [Next in Thread]