[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-arm] [PATCH v14 2/9] ACPI: Add APEI GHES table generation and
From: |
gengdongjiu |
Subject: |
Re: [Qemu-arm] [PATCH v14 2/9] ACPI: Add APEI GHES table generation and CPER record support |
Date: |
Thu, 4 Jan 2018 12:21:55 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 |
On 2018/1/3 21:31, Igor Mammedov wrote:
> On Wed, 3 Jan 2018 10:21:06 +0800
> gengdongjiu <address@hidden> wrote:
>
> [...]
>>>
>>>> In order to simulation, we hard code the error
>>>> type to Multi-bit ECC.
>>> Not sure what this is about, care to elaborate?
>>
>> please see Memory Error Record in [1], in which the "Memory Error Type"
>> field is used to describe the
>> error type, such as Multi-bit ECC or Parity Error etc. Because KVM or host
>> does not pass the memory
>> error type to Qemu, so Qemu does not know what is the error type for the
>> memory section. Hence we let QEMU simulate
>> the error type to Multi-bit ECC.
> Agreed that in case of TCG qemu won't likely have any way to get hw error
> from kernel
> so it could be useful only for testing purposes (i.e. 'make check' and/or
> testing
> how guest OS handles errors)
>
> But with KVM in kernel it should be possible to fish error out from host
> kernel
> and forward it to guest. If this are intended for handling HW errors,
> I'm not sure that 'Multi-bit ECC' could replace all real errors reported by
> host
> firmware.
Thanks for the mail.
I understand your meaning, I explain it more.
(1). In fact the Memory Error type is not important to guest OS, when the
OS(such as guest OS) do memory recovery,
it does not uses the memory error type, OS(such as guest OS) mainly uses the
memory_failure() function[1] to do recovery ,
In this function, it does not care what is the memory error type, It even does
not know what is the memory error type.
(2). If KVM forward the error type to guest, it needs more efforts, may be not
worth to do. The real memory error type exists in host
APEI table, only host APEI driver can get it, KVM can not directly get it. If
forward it to guest, KVM needs to firstly get the error
type from APEI driver and forward it to guest, which may be opposed by
James(address@hidden), I ever export more error information
to guest, but James does not agree that. In the ARM64 platform, we do not have
implementation to get the error information from the
APEI driver to KVM or to other kernel modules.
[1]:
int memory_failure(unsigned long pfn, int trapno, int flags)
{
......
}
>
>
>> [1]:
>> UEFI Spec 2.6 Errata A:
>>
>> "N.2.5 Memory Error Section"
>> -----------------+---------------+--------------+-------------------------------------------+
>> Mnemonic | Byte Offset | Byte Length | Description
>> |
>> -----------------+---------------+--------------+-------------------------------------------+
>> ........ | ............ | ......... | ...........
>> |
>> -----------------+---------------+--------------+-------------------------------------------+
>> Memory Error Type| 72 | 1 |Identifies the type of
>> error that occurred:|
>> | | | 0 – Unknown
>> |
>> | | | 1 – No error
>> |
>> | | | 2 – Single-bit ECC
>> |
>> | | | 3 – Multi-bit ECC
>> |
>> | | | 4 – Single-symbol ChipKill
>> ECC |
>> | | | 5 – Multi-symbol ChipKill ECC
>> |
>> | | | 6 – Master abort
>> |
>> | | | 7 – Target abort
>> |
>> | | | 8 – Parity Error
>> |
>> | | | 9 – Watchdog timeout
>> |
>> | | | 10 – Invalid address
>> |
>> | | | 11 – Mirror Broken
>> |
>> | | | 12 – Memory Sparing
>> |
>> | | | 13 - Scrub corrected error
>> |
>> | | | 14 - Scrub uncorrected error
>> |
>> | | | 15 - Physical Memory Map-out
>> event |
>> | | | All other values reserved.
>> |
>> -----------------+---------------+--------------+-------------------------------------------+
>> ........ | ............ | ......... | ...........
>> |
>> -----------------+---------------+--------------+-------------------------------------------+
> [...]
>
> .
>