qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 14/17] memory: add MemoryRegionIOMMUOps.repla


From: Liu, Yi L
Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add MemoryRegionIOMMUOps.replay() callback
Date: Fri, 31 Mar 2017 07:30:22 +0000

> -----Original Message-----
> From: Jason Wang [mailto:address@hidden
> Sent: Friday, March 31, 2017 3:17 PM
> To: Liu, Yi L <address@hidden>; 'Peter Xu' <address@hidden>
> Cc: Lan, Tianyu <address@hidden>; Tian, Kevin <address@hidden>;
> 'address@hidden' <address@hidden>; 'address@hidden'
> <address@hidden>; 'address@hidden' <address@hidden>; 'qemu-
> address@hidden' <address@hidden>; 'address@hidden'
> <address@hidden>; 'David Gibson' <address@hidden>
> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add
> MemoryRegionIOMMUOps.replay() callback
> 
> 
> 
> On 2017年03月31日 13:34, Liu, Yi L wrote:
> >> -----Original Message-----
> >> From: Jason Wang [mailto:address@hidden
> >> Sent: Thursday, March 30, 2017 7:58 PM
> >> To: Liu, Yi L <address@hidden>; 'Peter Xu' <address@hidden>
> >> Cc: 'address@hidden' <address@hidden>; Lan,
> >> Tianyu <address@hidden>; Tian, Kevin <address@hidden>;
> 'address@hidden'
> >> <address@hidden>; 'address@hidden' <address@hidden>;
> >> 'address@hidden' <address@hidden>; 'David Gibson'
> >> <address@hidden>; 'address@hidden' <qemu-
> >> address@hidden>
> >> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add
> >> MemoryRegionIOMMUOps.replay() callback
> >>
> >>
> >>
> >> On 2017年03月30日 19:06, Liu, Yi L wrote:
> >>>> -----Original Message-----
> >>>> From: Liu, Yi L
> >>>> Sent: Monday, March 27, 2017 5:22 PM
> >>>> To: Peter Xu <address@hidden>
> >>>> Cc: address@hidden; Lan, Tianyu <address@hidden>;
> >>>> Tian, Kevin <address@hidden>; address@hidden;
> >>>> address@hidden; address@hidden; address@hidden;
> >>>> David Gibson <address@hidden>; address@hidden
> >>>> Subject: RE: [Qemu-devel] [PATCH v7 14/17] memory: add
> >>>> MemoryRegionIOMMUOps.replay() callback
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Peter Xu [mailto:address@hidden
> >>>>> Sent: Monday, March 27, 2017 5:12 PM
> >>>>> To: Liu, Yi L <address@hidden>
> >>>>> Cc: address@hidden; Lan, Tianyu
> >>>>> <address@hidden>; Tian, Kevin <address@hidden>;
> >>>>> address@hidden; address@hidden; address@hidden;
> >>>>> address@hidden; David Gibson <address@hidden>;
> >>>>> address@hidden
> >>>>> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add
> >>>>> MemoryRegionIOMMUOps.replay() callback
> >>>>>
> >>>>> On Mon, Mar 27, 2017 at 08:35:05AM +0000, Liu, Yi L wrote:
> >>>>>>> -----Original Message-----
> >>>>>>> From: Qemu-devel
> >>>>>>> [mailto:address@hidden On
> >>>>>>> Behalf Of Peter Xu
> >>>>>>> Sent: Tuesday, February 7, 2017 4:28 PM
> >>>>>>> To: address@hidden
> >>>>>>> Cc: Lan, Tianyu <address@hidden>; Tian, Kevin
> >>>>>>> <address@hidden>; address@hidden; address@hidden;
> >>>>>>> address@hidden; address@hidden;
> >>>>>>> address@hidden; address@hidden; David Gibson
> >>>>>>> <address@hidden>
> >>>>>>> Subject: [Qemu-devel] [PATCH v7 14/17] memory: add
> >>>>>>> MemoryRegionIOMMUOps.replay() callback
> >>>>>>>
> >>>>>>> Originally we have one memory_region_iommu_replay() function,
> >>>>>>> which is the default behavior to replay the translations of the
> >>>>>>> whole IOMMU region. However, on some platform like x86, we may
> >>>>>>> want our own
> >>>>> replay logic for IOMMU regions.
> >>>>>>> This patch add one more hook for IOMMUOps for the callback, and
> >>>>>>> it'll override the default if set.
> >>>>>>>
> >>>>>>> Signed-off-by: Peter Xu <address@hidden>
> >>>>>>> ---
> >>>>>>>    include/exec/memory.h | 2 ++
> >>>>>>>    memory.c              | 6 ++++++
> >>>>>>>    2 files changed, 8 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h index
> >>>>>>> 0767888..30b2a74 100644
> >>>>>>> --- a/include/exec/memory.h
> >>>>>>> +++ b/include/exec/memory.h
> >>>>>>> @@ -191,6 +191,8 @@ struct MemoryRegionIOMMUOps {
> >>>>>>>        void (*notify_flag_changed)(MemoryRegion *iommu,
> >>>>>>>                                    IOMMUNotifierFlag old_flags,
> >>>>>>>                                    IOMMUNotifierFlag new_flags);
> >>>>>>> +    /* Set this up to provide customized IOMMU replay function */
> >>>>>>> +    void (*replay)(MemoryRegion *iommu, IOMMUNotifier
> >>>>>>> + *notifier);
> >>>>>>>    };
> >>>>>>>
> >>>>>>>    typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> >>>>>>> diff --git a/memory.c b/memory.c index 7a4f2f9..9c253cc 100644
> >>>>>>> --- a/memory.c
> >>>>>>> +++ b/memory.c
> >>>>>>> @@ -1630,6 +1630,12 @@ void
> >>>>>>> memory_region_iommu_replay(MemoryRegion
> >>>>>>> *mr, IOMMUNotifier *n,
> >>>>>>>        hwaddr addr, granularity;
> >>>>>>>        IOMMUTLBEntry iotlb;
> >>>>>>> +    /* If the IOMMU has its own replay callback, override */
> >>>>>>> +    if (mr->iommu_ops->replay) {
> >>>>>>> +        mr->iommu_ops->replay(mr, n);
> >>>>>>> +        return;
> >>>>>>> +    }
> >>>>>> Hi Alex, Peter,
> >>>>>>
> >>>>>> Will all the other vendors(e.g. PPC, s390, ARM) add their own
> >>>>>> replay callback as well? I guess it depends on whether the
> >>>>>> original replay algorithm work well for them? Do you have such 
> >>>>>> knowledge?
> >>>>> I guess so. At least for VT-d we had this callback since the
> >>>>> default replay mechanism did not work well on x86 due to its
> >>>>> extremely large memory region size. Thanks,
> >>>> thx. that would make sense.
> >>> Peter,
> >>>
> >>> Just come to mind that there may be a corner case here.
> >>>
> >>> Intel VT-d actually has a "pt" mode which allows device use physical
> >>> address even when VT-d is enabled. In kernel, there is a
> iommu_identity_mapping.
> >>> If a device is in this map, then it would use "pt" mode. So that
> >>> IOMMU driver would not build second-level page table for it.
> >> Yes, but qemu does not support ECAP_PT now, so guest will still have
> >> a page table in this case.
> > That's true. Without ECAP_PT, IOMMU driver would create a 1:1 map. So
> > this solution can work well even a device is in identify_map.
> >
> >>> Back to the virtual IOVA implementation, if an assigned device is in
> >>> the iommu_identity_mapping(e.g. VGA controller), it uses GPA directly to 
> >>> do
> DMA.
> >>> So it demands a GPA->HPA mapping in host. However, the
> >>> iommu->ops.replay is not able to build it when guest SL page table is 
> >>> empty.
> >>>
> >>> So I think building an entire guest PA->HPA mapping before guest
> >>> kernel boot would be recommended. Any thoughts?
> >> We plan to add PT in 2.10, a possible rough idea is disabled iommu
> >> dmar region and use another region without iommu_ops. Then
> >> vfio_listener_region_add() will just do the correct mappings.
> > Good to know it. Actually, I also need to expose ECAP_PT for vSVM. So
> > just comes to realize that the current replay solution may not work well 
> > when I
> expose ECAP_PT to guest.
> > I also have a rough idea here. The current listener in container
> > listens to address space named with devfn if virtual VTd is added. How
> > about adding one more listener to listen memory address space. So that the
> listener can build entire guest PA->HPA mapping.
> 
> This is only needed for PT. So looks like current code is sufficient to do 
> this I think.
> See the else part of if (memory_region_is_iommu()) of 
> vfio_listener_region_add().

Jason, when the listener listen to device address space, the "else part" may 
not work
even we set the mr->iommu_ops = NULL. The mr would be a non-ram region when the
time region_add is called since it is actually listen to changes from device 
address space.

Regards,
Yi L


reply via email to

[Prev in Thread] Current Thread [Next in Thread]