[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel
From: |
Peter Maydell |
Subject: |
Re: [PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM |
Date: |
Mon, 20 Jan 2020 12:15:21 +0000 |
On Fri, 17 Jan 2020 at 10:05, gengdongjiu <address@hidden> wrote:
>
> On 2020/1/17 0:28, Peter Maydell wrote:
> > On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <address@hidden> wrote:
> >
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> + ram_addr_t ram_addr;
> >> + hwaddr paddr;
> >> +
> >> + assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> >> +
> >> + if (acpi_enabled && addr &&
> >> + object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> + ram_addr = qemu_ram_addr_from_host(addr);
> >> + if (ram_addr != RAM_ADDR_INVALID &&
> >> + kvm_physical_memory_addr_from_host(c->kvm_state, addr,
> >> &paddr)) {
> >> + kvm_hwpoison_page_add(ram_addr);
> >> + /*
> >> + * Asynchronous signal will be masked by main thread, so
> >> + * only handle synchronous signal.
> >> + */
> >
> > I don't understand this comment. (I think we've had discussions
> > about it before, but it's still not clear to me.)
> >
> > This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
> >
> > (1) in the vcpu thread:
> > * the real SIGBUS handler sigbus_handler() sets a flag and arranges
> > for an immediate vcpu exit
> > * the vcpu thread reads the flag on exit from KVM_RUN and
> > calls kvm_arch_on_sigbus_vcpu() directly
> > * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error
> > can be MCEERR_AR or MCEERR_AO,
> but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO,
> because it needs do action immediately. For MCEERR_AO error, the action is
> optional and the error can be ignored.
> At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.
>
> > (2) MCE errors on other threads:
> > * here SIGBUS is blocked, so MCEERR_AR (action-required)
> > errors will cause the kernel to just kill the QEMU process
> > * MCEERR_AO errors will be handled via the iothread's use
> > of signalfd(), so kvm_on_sigbus() will get called from
> > the main thread, and it will call kvm_arch_on_sigbus_vcpu()
> > * in this case the passed in CPUState will (arbitrarily) be that
> > for the first vCPU
>
> For the MCE errors on other threads, it can only handle MCEERR_AO. If it is
> MCEERR_AR, the QEMU will assert and exit[2].
>
> Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the
> iothread's use of signalfd() through above path.
> Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it
> because SIGBUS is masked in main thread[3], for this case, I do not see QEMU
> handle it via signalfd() for MCEERR_AO errors from my test.
SIGBUS is blocked in the main thread because we use signalfd().
The function sigfd_handler() should be called and it will then
manually invoke the correct function for the signal.
> For Case1,I think we should not let guest know it, because it is not
> triggered by guest. only other APP send SIGBUS to tell QEMU do somethings.
I don't understand what you mean here by "other app" or
"guest" triggering of MCEERR. I thought that an MCEERR meant
"the hardware has detected that there is a problem with the
RAM". If there's a problem with the RAM and it's the RAM that's
being used as guest RAM, we need to tell the guest, surely ?
> For Case2,it does not call call kvm_arch_on_sigbus_vcpu().
It should do. The code you quote calls that function
for that case:
> [1]:
> /* Called synchronously (via signalfd) in main thread. */
> int kvm_on_sigbus(int code, void *addr)
> {
> #ifdef KVM_HAVE_MCE_INJECTION
> /* Action required MCE kills the process if SIGBUS is blocked. Because
> * that's what happens in the I/O thread, where we handle MCE via
> signalfd,
> * we can only get action optional here.
> */
> [2]: assert(code != BUS_MCEERR_AR);
> kvm_arch_on_sigbus_vcpu(first_cpu, code, addr);
> return 0;
> #else
> return 1;
> #endif
> }
> Above all, from my test, for MCEERR_AO error which is triggered by guest, it
> not call
kvm_arch_on_sigbus_vcpu().
I'm not sure what you mean by "triggered by guest". I assume that
exactly what kind of errors the kernel can report and when will
depend to some extent on the underlying hardware/firmware
implementation of reporting of memory errors, but in principle
the ABI allows the kernel to send SIGBUS_(BUS_MCEERR_AO) to the
main thread, the signal should be handled by signalfd, our code
for working with multiple fds should mean that the main thread
calls sigfd_handler() to deal with reading bytes from the signalfd
fd, and that function should then call sigbus_handler(), which
calls kvm_on_sigbus(), which calls kvm_arch_on_sigbus_vcpu().
If something in that code path is not working then we need to
find out what it is.
thanks
-- PMM
- Re: [PATCH v22 5/9] ACPI: Record the Generic Error Status Block address, (continued)
Re: [PATCH v22 5/9] ACPI: Record the Generic Error Status Block address, Igor Mammedov, 2020/01/28
[PATCH v22 3/9] ACPI: Build related register address fields via hardware error fw_cfg blob, Dongjiu Geng, 2020/01/08
[PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM, Dongjiu Geng, 2020/01/08
[PATCH v22 4/9] ACPI: Build Hardware Error Source Table, Dongjiu Geng, 2020/01/08
[PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries, Dongjiu Geng, 2020/01/08
- Re: [PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries, Peter Maydell, 2020/01/16
- Re: [PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries, Philippe Mathieu-Daudé, 2020/01/17
- Re: [PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries, gengdongjiu, 2020/01/17
- Re: [PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries, Peter Maydell, 2020/01/17
- Re: [PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries, Philippe Mathieu-Daudé, 2020/01/17
- Re: [PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries, gengdongjiu, 2020/01/19