qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 3/7] Add an "info pg" command that prints the current page


From: Richard Henderson
Subject: Re: [PATCH v4 3/7] Add an "info pg" command that prints the current page tables
Date: Wed, 24 Jul 2024 13:33:13 +1000
User-agent: Mozilla Thunderbird

On 7/23/24 11:05, Don Porter wrote:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index d946161717..c70d31433d 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -605,10 +605,11 @@ extern bool mttcg_enabled;
  /**
   * cpu_paging_enabled:
   * @cpu: The CPU whose state is to be inspected.
+ * @mmu_idx: 0 == traditional paging, 1 == nested paging
   *
   * Returns: %true if paging is enabled, %false otherwise.
   */
-bool cpu_paging_enabled(const CPUState *cpu);
+bool cpu_paging_enabled(const CPUState *cpu, int mmu_idx);

mmu_idx already means something very different to TCG.
You will only confuse matters by using this name for a different concept here.


@@ -671,9 +672,82 @@ int cpu_write_elf32_qemunote(WriteCoreDumpFunction f, 
CPUState *cpu,
   * Caller is responsible for freeing the data.
   */
  GuestPanicInformation *cpu_get_crash_info(CPUState *cpu);
-
  #endif /* !CONFIG_USER_ONLY */
+/* Maximum supported page table height - currently x86 at 5 */
+#define MAX_HEIGHT 5
+
+typedef struct PageTableLayout {
+    int height; /* Height of the page table */
+    int entries_per_node[MAX_HEIGHT + 1];
+} PageTableLayout;
+
+typedef struct DecodedPTE {
+    int prot; /* Always populated, arch-specific, decoded flags */
+    bool present;
+    bool leaf; /* Only valid if present */
+    bool reserved_bits_ok;
+    bool user_read_ok;
+    bool user_write_ok;
+    bool user_exec_ok;
+    bool super_read_ok;
+    bool super_write_ok;
+    bool super_exec_ok;
+    bool dirty;
+    hwaddr child; /* Only valid if present and !leaf */
+    uint64_t leaf_page_size; /* Only valid if present and leaf */
+    uint64_t nested_page_size; /*
+                                * If nested paging, the page size of the host
+                                * page storing the data, versus the size of the
+                                * guest page frame in leaf_page_size
+                                */
+    vaddr bits_translated; /*
+                            * The virtual address bits translated in walking
+                            * the page table to node[i].
+                            */
+    hwaddr pte_addr; /* (guest) physical address of the PTE */
+    hwaddr pte_host_addr; /* (host) physical address of the PTE */
+    uint64_t pte_contents; /* Raw contents of the pte */
+} DecodedPTE;
+
+typedef int (*qemu_page_walker_for_each)(CPUState *cs, void *data,
+                                         DecodedPTE *pte,
+                                         int height, int offset, int mmu_idx,
+                                         const PageTableLayout *layout);
+
+/**
+ * for_each_pte - iterate over a page table, and
+ *                call fn on each entry
+ *
+ * @cs - CPU state
+ * @fn(cs, data, pte, height, offset, layout) - User-provided function to call
+ *                                              on each pte.
+ *   * @cs - pass through cs
+ *   * @data - user-provided, opaque pointer
+ *   * @pte - current pte, decoded
+ *   * @height - height in the tree of pte
+ *   * @offset - offset within the page tabe node
+ *   * @layout - pointer to a PageTableLayout for this tree
+ * @data - opaque pointer; passed through to fn
+ * @visit_interior_nodes - if true, call fn() on interior entries in
+ *                         page table; if false, visit only leaf entries.
+ * @visit_not_present - if true, call fn() on entries that are not present.
+ *                         if false, visit only present entries.
+ * @mmu_idx - Which level of the mmu we are interested in:
+ *            0 == user mode, 1 == nested page table
+ *            Note that MMU_*_IDX macros are not consistent across
+ *            architectures.
+ *
+ * Returns true on success, false on error.
+ *
+ * We assume all callers of this function are in debug mode, and do not
+ * want to synthesize, say, a user-mode load, on each page in the address
+ * space.
+ */
+bool for_each_pte(CPUState *cs, qemu_page_walker_for_each fn, void *data,
+                  bool visit_interior_nodes, bool visit_not_present,
+                  bool visit_malformed, int mmu_idx);
+

None of this should be in hw/core/cpu.h.
Isolate it somewhere else, not included by 80% of QEMU.


diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
index 4c94e51267..d0e939def8 100644
--- a/include/hw/core/sysemu-cpu-ops.h
+++ b/include/hw/core/sysemu-cpu-ops.h
@@ -12,6 +12,43 @@
#include "hw/core/cpu.h" +/*
+ * struct mem_print_state: Used by qmp in walking page tables.
+ */
+struct mem_print_state {
+    GString *buf;
+    CPUArchState *env;
+    int vaw, paw; /* VA and PA width in characters */
+    int max_height;
+    int mmu_idx; /* 0 == user mode, 1 == nested page table */
+    bool (*flusher)(CPUState *cs, struct mem_print_state *state);
+    bool flush_interior; /* If false, only call flusher() on leaves */
+    bool require_physical_contiguity;
+    /*
+     * The height at which we started accumulating ranges, i.e., the
+     * next height we need to print once we hit the end of a
+     * contiguous range.
+     */
+    int start_height;
+    int leaf_height; /* The height at which we found a leaf, or -1 */
+    /*
+     * For compressing contiguous ranges, track the
+     * start and end of the range
+     */
+    hwaddr vstart[MAX_HEIGHT + 1]; /* Starting virt. addr. of open pte range */
+    hwaddr vend[MAX_HEIGHT + 1]; /* Ending virtual address of open pte range */
+    hwaddr pstart; /* Starting physical address of open pte range */
+    hwaddr pend; /* Ending physical address of open pte range */
+
+    /* PTE protection flags current root->leaf path */
+    uint64_t prot[MAX_HEIGHT + 1];
+
+    /* Page size (leaf) or address range covered (non-leaf). */
+    uint64_t pg_size[MAX_HEIGHT + 1];
+    int offset[MAX_HEIGHT + 1]; /* PTE range starting offsets */
+    int last_offset[MAX_HEIGHT + 1]; /* PTE range ending offsets */
+};

Likewise.
Also, CamelCase for typedefs, per coding style.


+    if (env->hflags & HF_GUEST_MASK) {
+
+        /* Extract the EPTP value from vmcs12 structure, store in arch state */
+        if (env->nested_state->format == KVM_STATE_NESTED_FORMAT_VMX) {
+            struct vmcs12 *vmcs =
+                (struct vmcs12 *) env->nested_state->data.vmx->vmcs12;

This is not required.  You appear to be confused by nested paging.

First: nested paging is how hardware virtualization works. When we are *using* hardware virtualization, all of that is the kernel's job. Our job as hypervisor is to give a bag of pages to the kernel and have it map them into the guest intermediate address space.

When we are *using* hardware virtualization, we are only ever concerned with one level of paging: from the guest to the intermediate address space. From there we use QEMU data structures to map to QEMU virtual address space (address_space_ld/st, etc).

This is all we will ever see from KVM, HVF etc.

With TCG, we can *emulate* hardware virtualization. It is at this point where we are concerned about two levels of paging, because QEMU is handling both.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]