qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v3 11/11] tests: add plugin asserting correctness of disc


From: Pierrick Bouvier
Subject: Re: [RFC PATCH v3 11/11] tests: add plugin asserting correctness of discon event's to_pc
Date: Fri, 20 Dec 2024 13:46:52 -0800
User-agent: Mozilla Thunderbird

On 12/20/24 13:17, Pierrick Bouvier wrote:
Hi Julian,

On 12/20/24 03:47, Julian Ganz wrote:
Hi Pierrick,

December 5, 2024 at 12:33 AM, "Pierrick Bouvier" wrote:
On 12/2/24 11:41, Julian Ganz wrote:
   +static void insn_exec(unsigned int vcpu_index, void *userdata)
   +{
   + struct cpu_state *state = qemu_plugin_scoreboard_find(states, vcpu_index);
   + uint64_t pc = (uint64_t) userdata;
   + GString* report;
   +
   + if (state->has_next) {
   + if (state->next_pc != pc) {
   + report = g_string_new("Trap target PC mismatch\n");
   + g_string_append_printf(report,
   + "Expected: %"PRIx64"\nEncountered: %"
   + PRIx64"\n",
   + state->next_pc, pc);
   + qemu_plugin_outs(report->str);
   + if (abort_on_mismatch) {
   + g_abort();
   + }
   + g_string_free(report, true);
   + }
   + state->has_next = false;
   + }
   +}

When booting an arm64 vm, I get this message:
Trap target PC mismatch
Expected: 23faf3a80
Encountered: 23faf3a84

A colleague of mine went to great lengths trying to track and reliably
reproduce this. We think that it's something amiss with the existing
instruction exec callback infrastructure. So... it's not something I'll
be addressing with the next iteration as it's out of scope. We'll
probably continue looking into it, though.

The mismatch is reported perfectly normal and boring exceptions and
interrupts with no indication of any differences to other (not reported)
events that fire on a regular basis. Apparently, once in a blue moon
(relatively speaking), for the first instruction of a handler (even
though it is definitely executed and qemu does print a trace-line for
that instruction):

| Trace 0: 0x7fffa0b03900 [00104004/000000023fde73b4/00000021/ff020200]
| Trace 0: 0x7fffa02d9580 [00104004/000000023fde72b8/00000021/ff020200]
| Trace 0: 0x7fffa02dfc40 [00104004/000000023fde7338/00000021/ff020200]
| Trace 0: 0x7fffa0b03d00 [00104004/000000023fde73d4/00000021/ff020200]
| Trace 0: 0x7fffa0b03e80 [00104004/000000023fde73d8/00000021/ff020200]
| Trace 0: 0x7fffa0b04140 [00104004/000000023fde7408/00000021/ff020200]
| Trace 0: 0x7fffa02dd6c0 [00104004/000000023fde70b8/00000021/ff020200]
| Trace 0: 0x7fffa02dd800 [00104004/000000023fde7b90/00000021/ff020200]
| cpu_io_recompile: rewound execution of TB to 000000023fde7b90
| Taking exception 5 [IRQ] on CPU 0
| ...from EL1 to EL1
| ...with ESR 0x0/0x3800000
| ...with SPSR 0x20000305
| ...with ELR 0x23fde7b90
| ...to EL1 PC 0x23fd77a80 PSTATE 0x23c5
| Trace 0: 0x7fffa13a8340 [00104004/000000023fd77a80/00000021/ff021201]
| Trace 0: 0x7fffa13a8480 [00104004/000000023fd77a84/00000021/ff020200]
| Trap target PC mismatch CPU 0
| Expected:    23fd77a80
| Encountered: 23fd77a84
| warning: 44   ./nptl/pthread_kill.c: No such file or directory
| Couldn't get registers: No such process.

It does show up with both single-core and multi-core VMs, so that at
least eliminates some possibilities. Maybe :/

The issue is nasty to reproduce in a way that allows any meaningful
investigation. It usually involves sifting through many GBs of Qemu logs
for maybe one occurance. We could add another testing/dummy plugin that
just prints the PC for _any_ instruction executed and have a skript
check for non-alternating Trace-lines from Qemu and that Plugin. But
then we're talking nearly double the amount of Lines to look through
with probably little additional information.


Thanks for the investigation.
I could reproduce this with this command line:
./build/qemu-system-aarch64 -M virt -plugin
./build/tests/tcg/plugins/libdiscons.so,abort=on -m 8G -device
virtio-blk-pci,drive=root -drive
if=none,id=root,file=/home/user/.work/images/debianaarch64.img -M virt
-cpu max,pauth=off  -drive
if=pflash,readonly=on,file=/usr/share/AAVMF/AAVMF_CODE.fd -drive
if=pflash,file=/home/user/.work/images/AAVMF_VARS.fd -d plugin,in_asm,op
-D crash.log

# -d plugin,in_asm,op allows to dump asm of every translated block,
plugin output (for discon plugin), and tcg op generated.

It reliably crashes with a single address.
Looking at the debug output (crash.log):
----------------
IN:
0x23faf3a80:  d108c3ff  sub      sp, sp, #0x230
# => This bb has a single instruction as input

OP:
# this is the TB instrumentation
   ld_i32 loc0,env,$0xfffffffffffffff8
   brcond_i32 loc0,$0x0,lt,$L0
   st8_i32 $0x1,env,$0xfffffffffffffffc

   ---- 0000000000000a80 0000000000000000 0000000000000000
# => we can see that there is no call_plugin, looks like instrumentation
# is not applied
   sub_i64 sp,sp,$0x230
   add_i64 pc,pc,$0x4
   goto_tb $0x1
   exit_tb $0x7f7eedd355c1
   set_label $L0
   exit_tb $0x7f7eedd355c3

----------------
IN:
0x23faf3a84:  a9b007e0  stp      x0, x1, [sp, #-0x100]!
0x23faf3a88:  a9010fe2  stp      x2, x3, [sp, #0x10]
...

OP:
   ld_i32 loc0,env,$0xfffffffffffffff8
   brcond_i32 loc0,$0x0,lt,$L0
   st8_i32 $0x0,env,$0xfffffffffffffffc

   ---- 0000000000000a84 0000000000000000 0000000000000000
# instruction is correctly applied
call plugin(0x7f7eec96d530),$0x1,$0,$0x0,$0x23faf3a84
   mov_i64 loc2,sp
   ...

Trap target PC mismatch
Expected:    23faf3a80
Encountered: 23faf3a84

The thing interesting here is that we can notice that 23faf3a80 is a
translation block with a single instruction, and we can see that
instrumentation is not applied for this instruction (call_plugin is not
present).

Overall, it really looks like a bug on QEMU side, where we miss
instrumenting something. I'll take a look. You can ignore this for now.


It seems like we have a problem to identify tb as mem_only. This was introduced to prevent double instrumentation of some memory access (touching MMIO), but it seems that as a result, we skip some instructions sometimes.

I need to dig into this further, but for now, you should be able to workaround this on your side with this patch:

diff --git a/plugins/api.c b/plugins/api.c
index 24ea64e2de5..6cb9d81a0a2 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -92,6 +92,7 @@ void qemu_plugin_register_vcpu_exit_cb(qemu_plugin_id_t id,

 static bool tb_is_mem_only(void)
 {
+    return false;
     return tb_cflags(tcg_ctx->gen_tb) & CF_MEMI_ONLY;
 }

Regards,
Julian


Regards,
Pierrick



reply via email to

[Prev in Thread] Current Thread [Next in Thread]