qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] ioapic: allow buggy guests mishandling level-tr


From: Liran Alon
Subject: Re: [Qemu-devel] [PATCH] ioapic: allow buggy guests mishandling level-triggered interrupts to make progress
Date: Mon, 1 Apr 2019 20:37:33 +0300


> On 1 Apr 2019, at 20:28, Vitaly Kuznetsov <address@hidden> wrote:
> 
> Liran Alon <address@hidden> writes:
> 
>>> On 1 Apr 2019, at 18:58, Vitaly Kuznetsov <address@hidden> wrote:
>>> 
>>> Liran Alon <address@hidden> writes:
>>> 
>>>>> On 1 Apr 2019, at 16:36, Vitaly Kuznetsov <address@hidden> wrote:
>>>>> 
>>>>> It was found that Hyper-V 2016 on KVM in some configurations (q35 machine 
>>>>> +
>>>>> piix4-usb-uhci) hangs on boot. Trace analysis led us to the conclusion 
>>>>> that
>>>>> it is mishandling level-triggered interrupt performing EOI without fixing
>>>>> the root cause.
>>>> 
>>>> I would rephrase as:
>>>> It was found that Hyper-V 2016 on KVM in some configurations (q35 machine 
>>>> + piix4-usb-uhci) hangs on boot.
>>>> Root-cause was that one of Hyper-V level-triggered interrupt handler 
>>>> performs EOI before fixing the root-cause.
>>>> This results in IOAPIC keep re-raising the level-triggered interrupt
>>>> after EOI because irq-line remains asserted.
>>> 
>>> Ok, thanks for the suggestion.
>>> 
>>>> 
>>>>> This causes immediate re-assertion and L2 VM (which is
>>>>> supposedly expected to fix the cause of the interrupt) is not making any
>>>>> progress.
>>>> 
>>>> I don’t know why you assume this.
>>>> From the trace we have examined, it seems that the EOI is performed by 
>>>> Hyper-V and not it’s guest
>>>> This means that the handler for this level-triggered interrupt is on
>>>> Hyper-V and not it’s guest.
>>> 
>>> If you let it run (with e.g. this patch or by setting preemtion timer >
>>> 0) you'll see that MMIO write fixing the cause of the interrupt is
>>> happening from L2:
>>> 
>>> (qemu) info pci:
>>> 
>>> Bus  0, device   4, function 0:
>>>   USB controller: PCI device 8086:7112
>>>     PCI subsystem 1af4:1100
>>>     IRQ 23.
>>>     BAR4: I/O at 0x6060 [0x607f].
>>>     id ""
>>> 
>>> ...
>>> 538597.212494: kvm_exit:             reason VMRESUME rip 0xfffff80004250115 
>>> info 0 0
>>> 538597.212499: kvm_entry:            vcpu 0
>>> 538597.212506: kvm_exit:             reason IO_INSTRUCTION rip 
>>> 0xfffff80e02ac6a27 info 60620009 0
>>> 538597.212507: kvm_nested_vmexit:    rip fffff80e02ac6a27 reason 
>>> IO_INSTRUCTION info1 60620009 info2 0 int_info 0 int_info_err 0
>>> 538597.212509: kvm_fpu:              unload
>>> 538597.212511: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>>> 538597.212516: kvm_fpu:              load
>>> 538597.212518: kvm_pio:              pio_read at 0x6062 size 2 count 1 val 
>>> 0x1 
>>> 538597.212519: kvm_entry:            vcpu 0
>>> 538597.212523: kvm_exit:             reason IO_INSTRUCTION rip 
>>> 0xfffff80e02ac6a61 info 60640009 0
>>> 538597.212523: kvm_nested_vmexit:    rip fffff80e02ac6a61 reason 
>>> IO_INSTRUCTION info1 60640009 info2 0 int_info 0 int_info_err 0
>>> 538597.212524: kvm_fpu:              unload
>>> 538597.212525: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>>> 538597.212528: kvm_fpu:              load
>>> 538597.212528: kvm_pio:              pio_read at 0x6064 size 2 count 1 val 
>>> 0xf 
>>> ...
>>> 
>>> and this happens after EOI from L1.
>> 
>> I see that the L2 guest is doing I/O read to the device BAR4 but do these 
>> reads lower the irq-line?
>> I would expect a write to lower the irq-line.
>> 
>> Looking at uhci_port_read(), it seems that offset 0x02 and 0x04 just return 
>> a value. Doesn’t lower irq-line.
>> (Even though offset 0x04 returns the "interrupt enable register”).
>> In contrast, looking at uhci_port_write(), it seems that writing to either 
>> offset 0x02 or 0x04 could lower the irq-line.
>> So you should look for pio_write to port 0x6062 or 0x6064 to see who
>> is actually responsible for lowering the irq-line.
> 
> Sorry, I probably like trimming traces too much. Writes happen too:
> 
> [005] 538597.212532: kvm_exit:             reason IO_INSTRUCTION rip 
> 0xfffff80e02ac6a8f info 60620001 0
> [005] 538597.212533: kvm_nested_vmexit:    rip fffff80e02ac6a8f reason 
> IO_INSTRUCTION info1 60620001 info2 0 int_info 0 int_info_err 0
> [005] 538597.212534: kvm_pio:              pio_write at 0x6062 size 2 count 1 
> val 0x1 

This clears bit UHCI_STS_USBINT from “status” register and zero “status2” 
register.

> [005] 538597.212534: kvm_fpu:              unload
> [005] 538597.212535: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
> [005] 538597.212543: kvm_fpu:              load
> [005] 538597.212544: kvm_entry:            vcpu 0
> [005] 538597.212547: kvm_exit:             reason IO_INSTRUCTION rip 
> 0xfffff80e02ac6a9c info 60640001 0
> [005] 538597.212548: kvm_nested_vmexit:    rip fffff80e02ac6a9c reason 
> IO_INSTRUCTION info1 60640001 info2 0 int_info 0 int_info_err 0
> [005] 538597.212548: kvm_pio:              pio_write at 0x6064 size 2 count 1 
> val 0x0 

This sets 0 in the “intr” register (Interrupt enable register).
At this point it is indeed likely that uhci_update_irq() will lower the 
irq-line. I agree. :)

> [005] 538597.212549: kvm_fpu:              unload
> [005] 538597.212550: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
> 
> and this likely lowers the line.

OK I was convinced that L2 does the irq-line lowering.

> I honestly have no idea how this all
> works on real hw but the comment in kernel ioapic says something about
> non-immediate delivery of the reasserted interrupt. True or not, this
> gives me some peace of mind :-)

Yeah I know.
It’s just weird that this non-immediate delivery translates to “that you can 
even resume into VMX non-root-mode and run a bunch of logic that lower 
irq-line” before interrupt is re-raised.
That’s kinda crazy :)

It’s more likely that this also reproduce on real hardware and that Microsoft 
maybe haven’t checked with this hardware?
Anyway, we are on the same page here. :)

-Liran

> 
> -- 
> Vitaly









reply via email to

[Prev in Thread] Current Thread [Next in Thread]