qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps
Date: Wed, 7 Sep 2016 20:20:35 +1000
User-agent: Mutt/1.7.0 (2016-08-17)

On Wed, Sep 07, 2016 at 03:09:16PM +0800, Peter Xu wrote:
> On Wed, Sep 07, 2016 at 04:02:39PM +1000, David Gibson wrote:
> > On Wed, Sep 07, 2016 at 01:32:22PM +0800, Peter Xu wrote:
> > > IOMMU Notifier list is used for notifying IO address mapping changes.
> > > Currently VFIO is the only user.
> > > 
> > > However it is possible that future consumer like vhost would like to
> > > only listen to part of its notifications (e.g., cache invalidations).
> > > 
> > > This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
> > > grained control of it.
> > > 
> > > IOMMUNotifier contains a bitfield for the notify consumer describing
> > > what kind of notification it is interested in. Currently two kinds of
> > > notifications are defined:
> > > 
> > > - IOMMU_NOTIFIER_CHANGE:       for entry changes (additions)
> > > - IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)
> > 
> > As noted on the other thread, I think the correct options for your
> > bitmap here are "map" and "unmap".  Which are triggered depends on the
> > permissions / existence of the *previous* mapping, as well as the new
> > one.
> 
> As mentioned in previous reply, both work for me. :)
> 
> If you insist on changing it (without anyone that strongly
> disagree...), I can do it in next spin.

This is not just a name change I'm proposing, but a semantic change
(or at least a clarification).

> > You could in fact have "map-read", "map-write", "unmap-read",
> > "unmap-write" as separate bitmap options (e.g. changing a mapping from
> > RO to WO would be both a read-unmap and write-map event).  I can't see
> > any real use for that though, so just "map" and "unmap" are probably
> > sufficient.
> 
> Agreed. We can enhance it in the future if there is any real
> requirement. Before that, it would be over-engineering.
> 
> (Btw, we should not need {read|write}_unmap in all cases. IIUC unmap
>  should not need any permission check.)

In practice probably not, but they are distinct operations.
read_unmap means a readable mapping has been removed, write_unmap
means a writable mapping has been removed.  Again - the permissions on
the *old* mapping are what matters here.

> 
> [...]
> 
> > > -static void vfio_iommu_map_notify(Notifier *n, void *data)
> > > +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
> > >  {
> > >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > >      VFIOContainer *container = giommu->container;
> > > @@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener 
> > > *listener,
> > >                                 section->offset_within_region;
> > >          giommu->container = container;
> > >          giommu->n.notify = vfio_iommu_map_notify;
> > > +        giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;
> > 
> > "caps" isn't really right.  It's a *requirement* that VFIO get all the
> > notifications, not a capability.  "caps" would only make sense on the
> > other side (the vIOMMU implementation).
> 
> Sounds reasonable. How about "flags"? Or any suggestion?

"flags" would do.  I feel like there should be a better name, but I
can't think of it.

> [...]
> 
> > >  /**
> > > @@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> > >   * IOMMU translation entries.
> > >   *
> > >   * @mr: the memory region to observe
> > > - * @n: the notifier to be added; the notifier receives a pointer to an
> > > - *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> > > - *     valid on exit from the notifier.
> > > + * @n: the IOMMUNotifier to be added; the notify callback receives a
> > > + *     pointer to an #IOMMUTLBEntry as the opaque value; the pointer
> > > + *     ceases to be valid on exit from the notifier.
> > >   */
> > > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier 
> > > *n);
> > > +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > +                                           IOMMUNotifier *n);
> > 
> > It seems to me that this should be allowed to fail, if the notifier
> > you're trying to register requires notifications that the MR
> > implementation can't supply.  That seems cleaner than delaying the
> > checking until the notification actually happens.
> 
> Do we have real use case for this one? For example, do we have case
> that VM "registering required notifications that MR cannot support"
> can still work?
> 
> Currently there is only one use case AFAIK, which is VFIO+IntelIOMMU.
> In that case, I take it a configuration error (we should never allow
> that configuration happen). IMHO All configuration errors should be
> reported ASAP, and we should never let VM start.

Yes... I'm not proposing changing that.  I just think it would be
cleaner to report the error through the error channels, instead of
just aborting.

> > >  /**
> > >   * memory_region_iommu_replay: replay existing IOMMU translations to
> > > @@ -636,7 +659,8 @@ void 
> > > memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> > >   * @is_write: Whether to treat the replay as a translate "write"
> > >   *     through the iommu
> > >   */
> > > -void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool 
> > > is_write);
> > > +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> > > +                                bool is_write);
> > >  
> > >  /**
> > >   * memory_region_unregister_iommu_notifier: unregister a notifier for
> > > @@ -646,7 +670,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, 
> > > Notifier *n, bool is_write);
> > >   *      needs to be called
> > >   * @n: the notifier to be removed.
> > >   */
> > > -void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier 
> > > *n);
> > > +void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
> > > +                                             IOMMUNotifier *n);
> > >  
> > >  /**
> > >   * memory_region_name: get a memory region's name
> > > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > > index 94dfae3..c17602e 100644
> > > --- a/include/hw/vfio/vfio-common.h
> > > +++ b/include/hw/vfio/vfio-common.h
> > > @@ -93,7 +93,7 @@ typedef struct VFIOGuestIOMMU {
> > >      VFIOContainer *container;
> > >      MemoryRegion *iommu;
> > >      hwaddr iommu_offset;
> > > -    Notifier n;
> > > +    IOMMUNotifier n;
> > >      QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
> > >  } VFIOGuestIOMMU;
> > >  
> > > diff --git a/memory.c b/memory.c
> > > index 0eb6895..45a3902 100644
> > > --- a/memory.c
> > > +++ b/memory.c
> > > @@ -1418,7 +1418,7 @@ void memory_region_init_iommu(MemoryRegion *mr,
> > >      memory_region_init(mr, owner, name, size);
> > >      mr->iommu_ops = ops,
> > >      mr->terminates = true;  /* then re-forwards */
> > > -    notifier_list_init(&mr->iommu_notify);
> > > +    QLIST_INIT(&mr->iommu_notify);
> > >  }
> > >  
> > >  static void memory_region_finalize(Object *obj)
> > > @@ -1513,13 +1513,16 @@ bool memory_region_is_logging(MemoryRegion *mr, 
> > > uint8_t client)
> > >      return memory_region_get_dirty_log_mask(mr) & (1 << client);
> > >  }
> > >  
> > > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
> > > +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > +                                           IOMMUNotifier *n)
> > >  {
> > > +    /* We need to register for at least one bitfield */
> > > +    assert(n->notifier_caps != IOMMU_NOTIFIER_NONE);
> > 
> > Not sure if it makes sense to implement NOTIFIER_NONE as a no-op just
> > for orthogonality.
> 
> I would think it is a programming error when registering to IOMMO
> notifier without any flags set (or cap). So I put an assert() here.
> Even if we have a no-op thing, anyone registers with
> IOMMU_NOTIFIER_NONE flag is suspecious and strange.

True.

> > 
> > >      if (mr->iommu_ops->notify_started &&
> > > -        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
> > > +        QLIST_EMPTY(&mr->iommu_notify)) {
> > >          mr->iommu_ops->notify_started(mr);
> > 
> > As noted above, I think register_notify should get the ability to
> > fail, which would happen if notify_started() failed (obviously it
> > needs to get a failure mode as well.  Basically notify_started is
> > required to check that this vIOMMU is able to supply the notifications
> > that have been requested.
> 
> Same as above, I just failed to think out a use case for that yet. As
> long as we can have a use case for it, it's easy to replace the
> returned void type into something like int, and report it up along the
> way.

Hm, I suppose.

> 
> [...]
> 
> > > @@ -1564,8 +1569,22 @@ void 
> > > memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
> > >  void memory_region_notify_iommu(MemoryRegion *mr,
> > >                                  IOMMUTLBEntry entry)
> > >  {
> > > +    IOMMUNotifier *iommu_notifier;
> > > +    IOMMUNotifierCap request_cap;
> > > +
> > >      assert(memory_region_is_iommu(mr));
> > > -    notifier_list_notify(&mr->iommu_notify, &entry);
> > > +
> > > +    if (entry.perm & IOMMU_RW) {
> > > +        request_cap = IOMMU_NOTIFIER_CHANGE;
> > > +    } else {
> > > +        request_cap = IOMMU_NOTIFIER_INVALIDATION;
> > > +    }
> > 
> > As noted right at the top, I don't think this logic is really right.
> > An in-place change should be treated as both a map and unmap.
> 
> IIUC, guest needs to send two invalidations for an in-place change,
> right? So a "change" will actually trigger this twice.

No, or at least not necessarily.  How the invalidate is reported
really depends on the guest side vIOMMU interface.  Maybe it requires
an explicit invalidate followed by a set, maybe a direct in place
replacement is possible (the PAPR hypercall interface allows this at
least in theory).

What I had in mind here is that assuming the vIOMMU can detect an in
place change, it would then ping all notifiers that have *either* the
MAP or UNMAP flag bits set.

> (At least this should be true for Power, and I am guessing the same
>  for Intel, otherwise PowerIOMMU+VFIO should not work before this
>  series...)

I don't really follow what you're saying here.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]