qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps
Date: Thu, 8 Sep 2016 18:00:03 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Wed, Sep 07, 2016 at 08:20:35PM +1000, David Gibson wrote:
> On Wed, Sep 07, 2016 at 03:09:16PM +0800, Peter Xu wrote:
> > On Wed, Sep 07, 2016 at 04:02:39PM +1000, David Gibson wrote:
> > > On Wed, Sep 07, 2016 at 01:32:22PM +0800, Peter Xu wrote:
> > > > IOMMU Notifier list is used for notifying IO address mapping changes.
> > > > Currently VFIO is the only user.
> > > > 
> > > > However it is possible that future consumer like vhost would like to
> > > > only listen to part of its notifications (e.g., cache invalidations).
> > > > 
> > > > This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
> > > > grained control of it.
> > > > 
> > > > IOMMUNotifier contains a bitfield for the notify consumer describing
> > > > what kind of notification it is interested in. Currently two kinds of
> > > > notifications are defined:
> > > > 
> > > > - IOMMU_NOTIFIER_CHANGE:       for entry changes (additions)
> > > > - IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)
> > > 
> > > As noted on the other thread, I think the correct options for your
> > > bitmap here are "map" and "unmap".  Which are triggered depends on the
> > > permissions / existence of the *previous* mapping, as well as the new
> > > one.
> > 
> > As mentioned in previous reply, both work for me. :)
> > 
> > If you insist on changing it (without anyone that strongly
> > disagree...), I can do it in next spin.
> 
> This is not just a name change I'm proposing, but a semantic change
> (or at least a clarification).

I see that kernel IOMMU driver is using map_page() and unmap_page()
for its interface. Now I prefer MAP/UNMAP.

> 
> > > You could in fact have "map-read", "map-write", "unmap-read",
> > > "unmap-write" as separate bitmap options (e.g. changing a mapping from
> > > RO to WO would be both a read-unmap and write-map event).  I can't see
> > > any real use for that though, so just "map" and "unmap" are probably
> > > sufficient.
> > 
> > Agreed. We can enhance it in the future if there is any real
> > requirement. Before that, it would be over-engineering.
> > 
> > (Btw, we should not need {read|write}_unmap in all cases. IIUC unmap
> >  should not need any permission check.)
> 
> In practice probably not, but they are distinct operations.
> read_unmap means a readable mapping has been removed, write_unmap
> means a writable mapping has been removed.  Again - the permissions on
> the *old* mapping are what matters here.

Why they are distinct operations? Or could you help explain in what
case would we need this flag (read/write) for an unmap() operation?

> 
> > 
> > [...]
> > 
> > > > -static void vfio_iommu_map_notify(Notifier *n, void *data)
> > > > +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
> > > >  {
> > > >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > > >      VFIOContainer *container = giommu->container;
> > > > @@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener 
> > > > *listener,
> > > >                                 section->offset_within_region;
> > > >          giommu->container = container;
> > > >          giommu->n.notify = vfio_iommu_map_notify;
> > > > +        giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;
> > > 
> > > "caps" isn't really right.  It's a *requirement* that VFIO get all the
> > > notifications, not a capability.  "caps" would only make sense on the
> > > other side (the vIOMMU implementation).
> > 
> > Sounds reasonable. How about "flags"? Or any suggestion?
> 
> "flags" would do.  I feel like there should be a better name, but I
> can't think of it.

Sure. I can switch.

> 
> > [...]
> > 
> > > >  /**
> > > > @@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> > > >   * IOMMU translation entries.
> > > >   *
> > > >   * @mr: the memory region to observe
> > > > - * @n: the notifier to be added; the notifier receives a pointer to an
> > > > - *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> > > > - *     valid on exit from the notifier.
> > > > + * @n: the IOMMUNotifier to be added; the notify callback receives a
> > > > + *     pointer to an #IOMMUTLBEntry as the opaque value; the pointer
> > > > + *     ceases to be valid on exit from the notifier.
> > > >   */
> > > > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier 
> > > > *n);
> > > > +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > > +                                           IOMMUNotifier *n);
> > > 
> > > It seems to me that this should be allowed to fail, if the notifier
> > > you're trying to register requires notifications that the MR
> > > implementation can't supply.  That seems cleaner than delaying the
> > > checking until the notification actually happens.
> > 
> > Do we have real use case for this one? For example, do we have case
> > that VM "registering required notifications that MR cannot support"
> > can still work?
> > 
> > Currently there is only one use case AFAIK, which is VFIO+IntelIOMMU.
> > In that case, I take it a configuration error (we should never allow
> > that configuration happen). IMHO All configuration errors should be
> > reported ASAP, and we should never let VM start.
> 
> Yes... I'm not proposing changing that.  I just think it would be
> cleaner to report the error through the error channels, instead of
> just aborting.

Agree. But since I have no obvious reason to change the return code,
I'd prefer to keep it as it is for this series if you don't mind.

[...]

> > > > @@ -1564,8 +1569,22 @@ void 
> > > > memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
> > > >  void memory_region_notify_iommu(MemoryRegion *mr,
> > > >                                  IOMMUTLBEntry entry)
> > > >  {
> > > > +    IOMMUNotifier *iommu_notifier;
> > > > +    IOMMUNotifierCap request_cap;
> > > > +
> > > >      assert(memory_region_is_iommu(mr));
> > > > -    notifier_list_notify(&mr->iommu_notify, &entry);
> > > > +
> > > > +    if (entry.perm & IOMMU_RW) {
> > > > +        request_cap = IOMMU_NOTIFIER_CHANGE;
> > > > +    } else {
> > > > +        request_cap = IOMMU_NOTIFIER_INVALIDATION;
> > > > +    }
> > > 
> > > As noted right at the top, I don't think this logic is really right.
> > > An in-place change should be treated as both a map and unmap.
> > 
> > IIUC, guest needs to send two invalidations for an in-place change,
> > right? So a "change" will actually trigger this twice.
> 
> No, or at least not necessarily.  How the invalidate is reported
> really depends on the guest side vIOMMU interface.  Maybe it requires
> an explicit invalidate followed by a set, maybe a direct in place
> replacement is possible (the PAPR hypercall interface allows this at
> least in theory).
> 
> What I had in mind here is that assuming the vIOMMU can detect an in
> place change, it would then ping all notifiers that have *either* the
> MAP or UNMAP flag bits set.

Yes, so I think we don't need a "change" interface. And maybe we
should start using MAP/UNMAP for the flags naming to avoid unecessary
misunderstandings (when we talk about CHANGE flag, it's actually
map(), but not unmap() + map()).

> 
> > (At least this should be true for Power, and I am guessing the same
> >  for Intel, otherwise PowerIOMMU+VFIO should not work before this
> >  series...)
> 
> I don't really follow what you're saying here.

I meant at least current Power IOMMU should be sending two notifies if
a "change" happened, otherwise current code won't work.

Thanks,

-- peterx



reply via email to

[Prev in Thread] Current Thread [Next in Thread]