qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tests/functional: Convert the kvm_xen_guest avocado test


From: David Woodhouse
Subject: Re: [PATCH] tests/functional: Convert the kvm_xen_guest avocado test
Date: Wed, 18 Dec 2024 22:42:47 +0100
User-agent: Evolution 3.52.3-0ubuntu1

On Wed, 2024-12-18 at 17:19 +0100, Thomas Huth wrote:
> On 18/12/2024 15.11, David Woodhouse wrote:
> > On Wed, 2024-12-18 at 14:38 +0100, Thomas Huth wrote:
> ...
> > > But FWIW, there seems to be another issue with this test. While running it
> > > multiple times, I sometimes see test_kvm_xen_guest_novector_noapic 
> > > hanging.
> > > According to the console output, the guest waits in vain for a device:
> > > 
> > > 2024-12-18 14:32:58,606: Initializing XFRM netlink socket
> > > 2024-12-18 14:32:58,607: NET: Registered PF_INET6 protocol family
> > > 2024-12-18 14:32:58,609: Segment Routing with IPv6
> > > 2024-12-18 14:32:58,609: In-situ OAM (IOAM) with IPv6
> > > 2024-12-18 14:32:58,610: NET: Registered PF_PACKET protocol family
> > > 2024-12-18 14:32:58,610: 8021q: 802.1Q VLAN Support v1.8
> > > 2024-12-18 14:32:58,611: 9pnet: Installing 9P2000 support
> > > 2024-12-18 14:32:58,613: NET: Registered PF_VSOCK protocol family
> > > 2024-12-18 14:32:58,614: IPI shorthand broadcast: enabled
> > > 2024-12-18 14:32:58,619: sched_clock: Marking stable (551147059, 
> > > -6778955)->(590359530, -45991426)
> > > 2024-12-18 14:32:59,507: tsc: Refined TSC clocksource calibration: 
> > > 2495.952 MHz
> > > 2024-12-18 14:32:59,508: clocksource: tsc: mask: 0xffffffffffffffff 
> > > max_cycles: 0x23fa49fc138, max_idle_ns: 440795295059 ns
> > > 2024-12-18 14:32:59,509: clocksource: Switched to clocksource tsc
> > > 2024-12-18 14:33:28,667: xenbus_probe_frontend: Waiting for devices to 
> > > initialise: 25s...20s...15s...10s...5s...0s...
> > > 
> > > Have you seen this problem before?
> > 
> > That seems like event channel interrupts aren't being routed to the
> > legacy i8259 PIC. I've certainly seen that kind of thing before,
> > especially when asserted level-triggered interrupts weren't correctly
> > being asserted. But I don't expect that of QEMU. I'll see if I can
> > reproduce; thanks.
> > 
> > How often does it happen?
> 
> With the new functional test, it happens maybe 2 times out of 100 test runs.
> 
> I wasn't able to reproduce it with the avocado version yet, but that also 
> runs 10x slower, so it takes a longer time to get to that many runs...

I can reproduce it probably about one in ten attempts.

It seems like it's because of the way QEMU handles shared level-
triggered interrupts.

We kind of work around it with PCI INTx demultiplexing, but the Xen
guest explicitly asks for the interrupt to be delivered to INT10. So it
asserts INT10, then I suspect something on the PCI deasserts it, and
the Xen interrupt is lost.

The guest configures the Xen event channel IRQ to be delivered on
IRQ10, but IRQ10 is *also* a PCI INTX of some device. So if the PCI
device *clears* IRQ10 at the wrong moment, we miss a Xen interrupt...
which means we end up waiting for ever.

We *really* ought to do this with a callback when the interrupt is
acked in the PIC/IOAPIC, and any interrupt source which wants it to be
still asserted can reassert it from that callback, which is precisely
how the vfio eventfd 'resampler' works. Or *should* work, if QEMU was
actually capable of working that way.

I'll see if I can come up with a workaround... or whether I fall into
the rabbithole of fixing the overall level interrupt stuff.


Attachment: smime.p7s
Description: S/MIME cryptographic signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]