Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support

From:	Jike Song
Subject:	Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Date:	Thu, 05 May 2016 14:55:46 +0800
User-agent:	Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

On 05/04/2016 06:43 AM, Alex Williamson wrote:
> On Tue, 3 May 2016 00:10:41 +0530
> Kirti Wankhede <address@hidden> wrote:
>> +
>> +/*
>> + * Pin a set of guest PFNs and return their associated host PFNs for vGPU.
>> + * @vaddr [in]: array of guest PFNs
>> + * @npage [in]: count of array elements
>> + * @prot [in] : protection flags
>> + * @pfn_base[out] : array of host PFNs
>> + */
>> +int vfio_pin_pages(void *iommu_data, dma_addr_t *vaddr, long npage,
>> +               int prot, dma_addr_t *pfn_base)
>> +{
>> +    struct vfio_iommu *iommu = iommu_data;
>> +    struct vfio_domain *domain = NULL, *domain_vgpu = NULL;
>> +    int i = 0, ret = 0;
>> +    long retpage;
>> +    dma_addr_t remote_vaddr = 0;
>> +    dma_addr_t *pfn = pfn_base;
>> +    struct vfio_dma *dma;
>> +
>> +    if (!iommu || !pfn_base)
>> +            return -EINVAL;
>> +
>> +    if (list_empty(&iommu->domain_list)) {
>> +            ret = -EINVAL;
>> +            goto pin_done;
>> +    }
>> +
>> +    get_first_domains(iommu, &domain, &domain_vgpu);
>> +
>> +    // Return error if vGPU domain doesn't exist
> 
> No c++ style comments please.
> 
>> +    if (!domain_vgpu) {
>> +            ret = -EINVAL;
>> +            goto pin_done;
>> +    }
>> +
>> +    for (i = 0; i < npage; i++) {
>> +            struct vfio_vgpu_pfn *p;
>> +            struct vfio_vgpu_pfn *lpfn;
>> +            unsigned long tpfn;
>> +            dma_addr_t iova;
>> +
>> +            mutex_lock(&iommu->lock);
>> +
>> +            iova = vaddr[i] << PAGE_SHIFT;
>> +
>> +            dma = vfio_find_dma(iommu, iova, 0 /*  size */);
>> +            if (!dma) {
>> +                    mutex_unlock(&iommu->lock);
>> +                    ret = -EINVAL;
>> +                    goto pin_done;
>> +            }
>> +
>> +            remote_vaddr = dma->vaddr + iova - dma->iova;
>> +
>> +            retpage = vfio_pin_pages_internal(domain_vgpu, remote_vaddr,
>> +                                              (long)1, prot, &tpfn);
>> +            mutex_unlock(&iommu->lock);
>> +            if (retpage <= 0) {
>> +                    WARN_ON(!retpage);
>> +                    ret = (int)retpage;
>> +                    goto pin_done;
>> +            }
>> +
>> +            pfn[i] = tpfn;
>> +
>> +            mutex_lock(&domain_vgpu->lock);
>> +
>> +            // search if pfn exist
>> +            if ((p = vfio_find_vgpu_pfn(domain_vgpu, tpfn))) {
>> +                    atomic_inc(&p->ref_count);
>> +                    mutex_unlock(&domain_vgpu->lock);
>> +                    continue;
>> +            }
> 
> The only reason I can come up with for why we'd want to integrate an
> api-only domain into the existing type1 code would be to avoid page
> accounting issues where we count locked pages once for a normal
> assigned device and again for a vgpu, but that's not what we're doing
> here.  We're not only locking the pages again regardless of them
> already being locked, we're counting every time we lock them through
> this new interface.  So there's really no point at all to making type1
> become this unsupportable.  In that case we should be pulling out the
> common code that we want to share from type1 and making a new type1
> compatible vfio iommu backend rather than conditionalizing everything
> here.
> 
>> +
>> +            // add to pfn_list
>> +            lpfn = kzalloc(sizeof(*lpfn), GFP_KERNEL);
>> +            if (!lpfn) {
>> +                    ret = -ENOMEM;
>> +                    mutex_unlock(&domain_vgpu->lock);
>> +                    goto pin_done;
>> +            }
>> +            lpfn->vmm_va = remote_vaddr;
>> +            lpfn->iova = iova;
>> +            lpfn->pfn = pfn[i];
>> +            lpfn->npage = 1;
>> +            lpfn->prot = prot;
>> +            atomic_inc(&lpfn->ref_count);
>> +            vfio_link_vgpu_pfn(domain_vgpu, lpfn);
>> +            mutex_unlock(&domain_vgpu->lock);
>> +    }
>> +
>> +    ret = i;
>> +
>> +pin_done:
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL(vfio_pin_pages);
>> +

IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
hardware. It just, as you said in another mail, "rather than
programming them into an IOMMU for a device, it simply stores the
translations for use by later requests".

That imposes a constraint on gfx driver: hardware IOMMU must be disabled.
Otherwise, if IOMMU is present, the gfx driver eventually programs
the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page;
Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
translations without any knowledge about hardware IOMMU, how is the
device model supposed to do to get an IOVA for a given GPA (thereby HPA
by the IOMMU backend here)?

If things go as guessed above, as vfio_pin_pages() indicates, it
pin & translate vaddr to PFN, then it will be very difficult for the
device model to figure out:

        1, for a given GPA, how to avoid calling dma_map_page multiple times?
        2, for which page to call dma_unmap_page?

--
Thanks,
Jike

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH v3 1/3] vGPU Core driver, (continued)
- [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Kirti Wankhede, 2016/05/02
  - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Jike Song, 2016/05/03
  - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Alex Williamson, 2016/05/03
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Tian, Kevin, 2016/05/03
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Jike Song <=
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Tian, Kevin, 2016/05/05
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Jike Song, 2016/05/10
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Neo Jia, 2016/05/10
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Jike Song, 2016/05/11
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Alex Williamson, 2016/05/11
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Jike Song, 2016/05/12
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Neo Jia, 2016/05/12
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Tian, Kevin, 2016/05/12
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Jike Song, 2016/05/13
    - Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu, Neo Jia, 2016/05/13

Prev by Date: Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
Next by Date: Re: [Qemu-devel] [PATCH v18 0/8] Block replication for continuous checkpoints
Previous by thread: Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Next by thread: Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu
Index(es):
- Date
- Thread