qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC v4 15/20] intel_iommu: provide its own repla


From: Jason Wang
Subject: Re: [Qemu-devel] [PATCH RFC v4 15/20] intel_iommu: provide its own replay() callback
Date: Mon, 23 Jan 2017 09:48:48 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1



On 2017年01月22日 16:51, Peter Xu wrote:
On Sun, Jan 22, 2017 at 03:56:10PM +0800, Jason Wang wrote:

[...]

+/**
+ * vtd_page_walk_level - walk over specific level for IOVA range
+ *
+ * @addr: base GPA addr to start the walk
+ * @start: IOVA range start address
+ * @end: IOVA range end address (start <= addr < end)
+ * @hook_fn: hook func to be called when detected page
+ * @private: private data to be passed into hook func
+ * @read: whether parent level has read permission
+ * @write: whether parent level has write permission
+ * @skipped: accumulated skipped ranges
What's the usage for this parameter? Looks like it was never used in this
series.
This was for debugging purpose before, and I kept it in case one day
it can be used again, considering that will not affect much on the
overall performance.

I think we usually do not keep debugging codes outside debug macros.


+ * @notify_unmap: whether we should notify invalid entries
+ */
+static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
+                               uint64_t end, vtd_page_walk_hook hook_fn,
+                               void *private, uint32_t level,
+                               bool read, bool write, uint64_t *skipped,
+                               bool notify_unmap)
+{
+    bool read_cur, write_cur, entry_valid;
+    uint32_t offset;
+    uint64_t slpte;
+    uint64_t subpage_size, subpage_mask;
+    IOMMUTLBEntry entry;
+    uint64_t iova = start;
+    uint64_t iova_next;
+    uint64_t skipped_local = 0;
+    int ret = 0;
+
+    trace_vtd_page_walk_level(addr, level, start, end);
+
+    subpage_size = 1ULL << vtd_slpt_level_shift(level);
+    subpage_mask = vtd_slpt_level_page_mask(level);
+
+    while (iova < end) {
+        iova_next = (iova & subpage_mask) + subpage_size;
+
+        offset = vtd_iova_level_offset(iova, level);
+        slpte = vtd_get_slpte(addr, offset);
+
+        /*
+         * When one of the following case happens, we assume the whole
+         * range is invalid:
+         *
+         * 1. read block failed
Don't get the meaning (and don't see any code relate to this comment).
I took above vtd_get_slpte() a "read", so I was trying to mean that we
failed to read the SLPTE due to some reason, we assume the range is
invalid.

I see, so we'd better move the comment above of vtd_get_slpte().


+         * 3. reserved area non-zero
+         * 2. both read & write flag are not set
Should be 1,2,3? And the above comment is conflict with the code at least
when notify_unmap is true.
Yes, okay I don't know why 3 jumped. :(

And yes, I should mention that "both read & write flag not set" only
suites for page tables here.

+         */
+
+        if (slpte == (uint64_t)-1) {
If this is true, vtd_slpte_nonzero_rsvd(slpte) should be true too I think?
Yes, but we are doing two checks here:

- checking against -1 to make sure slpte read success
- checking against nonzero reserved fields to make sure it follows spec

Imho we should not skip the first check here, only if one day removing
this may really matter (e.g., for performance reason? I cannot think
of one yet).

Ok. (return -1 seems not good, but we can address this in the future).


+            trace_vtd_page_walk_skip_read(iova, iova_next);
+            skipped_local++;
+            goto next;
+        }
+
+        if (vtd_slpte_nonzero_rsvd(slpte, level)) {
+            trace_vtd_page_walk_skip_reserve(iova, iova_next);
+            skipped_local++;
+            goto next;
+        }
+
+        /* Permissions are stacked with parents' */
+        read_cur = read && (slpte & VTD_SL_R);
+        write_cur = write && (slpte & VTD_SL_W);
+
+        /*
+         * As long as we have either read/write permission, this is
+         * a valid entry. The rule works for both page or page tables.
+         */
+        entry_valid = read_cur | write_cur;
+
+        if (vtd_is_last_slpte(slpte, level)) {
+            entry.target_as = &address_space_memory;
+            entry.iova = iova & subpage_mask;
+            /*
+             * This might be meaningless addr if (!read_cur &&
+             * !write_cur), but after all this field will be
+             * meaningless in that case, so let's share the code to
+             * generate the IOTLBs no matter it's an MAP or UNMAP
+             */
+            entry.translated_addr = vtd_get_slpte_addr(slpte);
+            entry.addr_mask = ~subpage_mask;
+            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+            if (!entry_valid && !notify_unmap) {
+                trace_vtd_page_walk_skip_perm(iova, iova_next);
+                skipped_local++;
+                goto next;
+            }
Under which case should we care about unmap here (better with a comment I
think)?
When PSIs are for invalidation, rather than newly mapped entries. In
that case, notify_unmap will be true, and here we need to notify
IOMMU_NONE to do the cache flush or unmap.

(this page walk is not only for replaying, it is for cache flushing as
  well)

Do you have suggestion on the comments?

I think then we'd better move this to patch 18 which will use notify_unmap.


+            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
+                                    entry.addr_mask, entry.perm);
+            if (hook_fn) {
+                ret = hook_fn(&entry, private);
For better performance, we could try to merge adjacent mappings here. I
think both vfio and vhost support this and it can save a lot of ioctls.
Looks so, and this is in my todo list.

Do you mind I do it later after this series merged? I would really
appreciate if we can have this codes settled down first (considering
that this series has been dangling for half a year, or more, startint
from Aviv's series), and I am just afraid this will led to
unconvergence of this series (and I believe there are other places
that can be enhanced in the future as well).

Yes, let's do it on top.


+                if (ret < 0) {
+                    error_report("Detected error in page walk hook "
+                                 "function, stop walk.");
+                    return ret;
+                }
+            }
+        } else {
+            if (!entry_valid) {
+                trace_vtd_page_walk_skip_perm(iova, iova_next);
+                skipped_local++;
+                goto next;
+            }
+            trace_vtd_page_walk_level(vtd_get_slpte_addr(slpte), level - 1,
+                                      iova, MIN(iova_next, end));
This looks duplicated?
I suppose not. The level is different.

Seem not? level - 1 was passed to vtd_page_walk_level() as level which did:

trace_vtd_page_walk_level(addr, level, start, end);



+            ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte), iova,
+                                      MIN(iova_next, end), hook_fn, private,
+                                      level - 1, read_cur, write_cur,
+                                      &skipped_local, notify_unmap);
+            if (ret < 0) {
+                error_report("Detected page walk error on addr 0x%"PRIx64
+                             " level %"PRIu32", stop walk.", addr, level - 1);
Guest triggered, so better use debug macro or tracepoint.
Sorry. Will replace all the error_report() in the whole series.

+                return ret;
+            }
+        }
+
+next:
+        iova = iova_next;
+    }
+
+    if (skipped) {
+        *skipped += skipped_local;
+    }
+
+    return 0;
+}
+
+/**
+ * vtd_page_walk - walk specific IOVA range, and call the hook
+ *
+ * @ce: context entry to walk upon
+ * @start: IOVA address to start the walk
+ * @end: IOVA range end address (start <= addr < end)
+ * @hook_fn: the hook that to be called for each detected area
+ * @private: private data for the hook function
+ */
+static int vtd_page_walk(VTDContextEntry *ce, uint64_t start, uint64_t end,
+                         vtd_page_walk_hook hook_fn, void *private)
+{
+    dma_addr_t addr = vtd_get_slpt_base_from_context(ce);
+    uint32_t level = vtd_get_level_from_context_entry(ce);
+
+    if (!vtd_iova_range_check(start, ce)) {
+        error_report("IOVA start 0x%"PRIx64 " end 0x%"PRIx64" exceeds limits",
+                     start, end);
Guest triggered, better use debug macro or tracepoint.
Same.

+        return -VTD_FR_ADDR_BEYOND_MGAW;
+    }
+
+    if (!vtd_iova_range_check(end, ce)) {
+        /* Fix end so that it reaches the maximum */
+        end = vtd_iova_limit(ce);
+    }
+
+    trace_vtd_page_walk_level(addr, level, start, end);
Duplicated with the tracepoint in vtd_page_walk_level() too?
Nop? This should be the top level.

+
+    return vtd_page_walk_level(addr, start, end, hook_fn, private,
+                               level, true, true, NULL, false);
+}
+
  /* Map a device to its corresponding domain (context-entry) */
  static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
                                      uint8_t devfn, VTDContextEntry *ce)
@@ -2395,6 +2569,37 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
      return vtd_dev_as;
  }
+static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
+{
+    memory_region_notify_one((IOMMUNotifier *)private, entry);
+    return 0;
+}
+
+static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
+{
+    VTDAddressSpace *vtd_as = container_of(mr, VTDAddressSpace, iommu);
+    IntelIOMMUState *s = vtd_as->iommu_state;
+    uint8_t bus_n = pci_bus_num(vtd_as->bus);
+    VTDContextEntry ce;
+
+    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
+        /*
+         * Scanned a valid context entry, walk over the pages and
+         * notify when needed.
+         */
+        trace_vtd_replay_ce_valid(bus_n, PCI_SLOT(vtd_as->devfn),
+                                  PCI_FUNC(vtd_as->devfn),
+                                  VTD_CONTEXT_ENTRY_DID(ce.hi),
+                                  ce.hi, ce.lo);
+        vtd_page_walk(&ce, 0, ~0, vtd_replay_hook, (void *)n);
~0ULL?
Fixing up.

Thanks,

-- peterx





reply via email to

[Prev in Thread] Current Thread [Next in Thread]