qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 7/7] acpi/ghes: extend arm error injection logic


From: Jonathan Cameron
Subject: Re: [PATCH v3 7/7] acpi/ghes: extend arm error injection logic
Date: Fri, 26 Jul 2024 14:22:25 +0100

On Mon, 22 Jul 2024 08:45:59 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Enrich CPER error injection logic for ARM processor to allow
> setting values to  from UEFI 2.10 tables N.16 and N.17.
> 
> It should be noticed that, with such change, all arguments are
> now optional, so, once QMP is negotiated with:
> 
>       { "execute": "qmp_capabilities" }
> 
> the simplest way to generate a cache error is to use:
> 
>       { "execute": "arm-inject-error" }
> 
> Also, as now PEI is mapped into an array, it is possible to
> inject multiple errors at the same CPER record with:
> 
>       { "execute": "arm-inject-error", "arguments": {
>          "error": [ {"type": [ "cache-error" ]},
>                     {"type": [ "tlb-error" ]} ] } }
> 
> This would generate both cache and TLB errors, using default
> values for other fields.
> 
> As all fields from ARM Processor CPER are now mapped, all
> types of CPER records can be generated with the new QAPI.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
If you are happy to smash this into patch 4 then also take ownership
of the result and change the author as I wrote almost none of the code
that ended up in the result as only the GHESv2 stuff was mind
even before you joined this effort - the rest was Shiju's

If you want, I'm fine with a co-developed on the result

Jonathan



> ---
>  hw/acpi/ghes.c                  | 168 +++++++-------
>  hw/arm/arm_error_inject.c       | 399 +++++++++++++++++++++++++++++++-
>  hw/arm/arm_error_inject_stubs.c |  20 +-
>  include/hw/acpi/ghes.h          |  40 +++-
>  qapi/arm-error-inject.json      | 250 +++++++++++++++++++-
>  tests/lcitool/libvirt-ci        |   2 +-
>  6 files changed, 778 insertions(+), 101 deletions(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index ebf1b812aaaa..afd1d098a7e3 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c

> +    build_append_int_noprefix(table, err.running_state, 4);
> +
> +    /* PSCI state: only valid when running state is zero  */
> +    build_append_int_noprefix(table, err.psci_state, 4);
> +
> +    for (i = 0; i < err.err_info_num; i++) {
> +        /* ARM Propcessor error information */
> +        /* Version */
> +        build_append_int_noprefix(table, 0, 1);
> +
> +        /*  Length */
> +        build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1);
> +
> +        /* Validation Bits */
> +        build_append_int_noprefix(table, err.pei[i].validation, 2);

Maybe drop some comments when the data being written makes it obvious?

> +
> +        /* Type */
> +        build_append_int_noprefix(table, err.pei[i].type, 1);
> +
> +        /* Multiple error count */
> +        build_append_int_noprefix(table, err.pei[i].multiple_error, 2);
> +
> +        /* Flags  */
> +        build_append_int_noprefix(table, err.pei[i].flags, 1);
> +
> +        /* Error information  */
> +        build_append_int_noprefix(table, err.pei[i].error_info, 8);
> +
> +        /* Virtual fault address  */
> +        build_append_int_noprefix(table, err.pei[i].virt_addr, 8);
> +
> +        /* Physical fault address  */
> +        build_append_int_noprefix(table, err.pei[i].phy_addr, 8);
> +    }
> +
> +    for (i = 0; i < err.context_info_num; i++) {
> +        /* ARM Propcessor error context information */
> +        /* Version */
> +        build_append_int_noprefix(table, 0, 2);
> +
> +        /* Validation type */
> +        build_append_int_noprefix(table, err.context[i].type, 2);
> +
> +        /* Register array size */
> +        build_append_int_noprefix(table, err.context[i].size * 8, 4);
> +
> +        /* Register array (byte 8 of Context info) */
> +        for (j = 0; j < err.context[i].size; j++) {
> +            build_append_int_noprefix(table, err.context[i].array[j], 8);
> +        }
>      }

> diff --git a/hw/arm/arm_error_inject.c b/hw/arm/arm_error_inject.c
> index 1da97d5d4fdc..67f1c77546b9 100644
> --- a/hw/arm/arm_error_inject.c
> +++ b/hw/arm/arm_error_inject.c
> @@ -10,23 +10,408 @@

> +
> +/* Handle ARM Context */
> +static ArmContext *qmp_arm_context(uint16_t *context_info_num,
> +                                   uint32_t *context_length,
> +                                   bool has_context,
> +                                   ArmProcessorContextList const 
> *context_list)
> +{
> +    ArmProcessorContextList const *next;
> +    ArmContext *context = NULL;
> +    uint16_t i, j, num, default_type;
> +
> +    default_type = get_default_context_type();
> +
> +    if (!has_context) {
> +        *context_info_num = 0;
> +        *context_length = 0;
> +
> +        return NULL;
> +    }
> +
> +    /* Calculate sizes */
> +    num = 0;
> +    for (next = context_list; next; next = next->next) {
> +        uint32_t n_regs = 0;
> +
> +        if (next->value->has_q_register) {
> +            uint64List *reg = next->value->q_register;
> +
> +            while (reg) {
> +                n_regs++;
> +                reg = reg->next;
> +            }
> +
> +            if (next->value->has_minimal_size &&
> +                                        next->value->minimal_size < n_regs) {
I'd align just after (

> +
> +static uint8_t *qmp_arm_vendor(uint32_t *vendor_num, bool 
> has_vendor_specific,
> +                               uint8List const *vendor_specific_list)
> +{
> +    uint8List const *next = vendor_specific_list;
> +    uint8_t *vendor = NULL, *p;

vendor always set before use.

> +
> +    if (!has_vendor_specific) {
> +        return NULL;
> +    }
> +
> +    *vendor_num = 0;
> +
> +    while (next) {
> +        next = next->next;
> +        (*vendor_num)++;
> +    }
> +
> +    vendor = g_malloc(*vendor_num);
> +
> +    p = vendor;
> +    next = vendor_specific_list;
> +    while (next) {
> +        *p = next->value;
> +        next = next->next;
> +        p++;
> +    }
> +
> +    return vendor;
> +}
> diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> index 430e6cea6b60..2a314830fe60 100644
> --- a/qapi/arm-error-inject.json
> +++ b/qapi/arm-error-inject.json

> +
> +##
> +# @ArmProcessorErrorInformation:
> +#
> +# Contains ARM processor error information (PEI) data according with UEFI
> +# CPER table N.17.
> +#
> +# @validation:
> +#       Valid validation bits for error-info section.
> +#       Argument is optional. If not specified, those flags will be enabled:
> +#       first-error-cap and propagated.
> +#
> +# @type:
> +#       ARM processor error types to inject. Argument is mandatory.
> +#
> +# @multiple-error:
> +#       Indicates whether multiple errors have occurred.
> +#       Argument is optional. If not specified and @validation not enforced,

forced probably rather than enforced.

> +#       this field will be marked as invalid at CPER record..
. only

Good to mention the odd encoding of 0 = single error, 1 = multiple (lost count)
2+ = actual count of errors

> +#
> +# @flags:
> +#       Indicates flags that describe the error attributes.
> +#       Argument is optional. If not specified and defaults to
> +#       first-error and propagated.
> +#
> +# @error-info:
> +#       Error information structure is specific to each error type.
> +#       Argument is optional, and its value depends on the PEI type(s).
> +#       If not defined, the default depends on the type:
> +#       - for cache-error: 0x0091000F;
> +#       - for tlb-error: 0x0054007F;
> +#       - for bus-error: 0x80D6460FFF;
> +#       - for micro-arch-error: 0x78DA03FF;
> +#       - if multiple types used, this bit is disabled from @validation bits.
> +#
> +# @virt-addr:
> +#       Virtual fault address associated with the error.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record..
> +#
> +# @phy-addr:
> +#       Physical fault address associated with the error.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record..
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'ArmProcessorErrorInformation',
> +  'data': { '*validation': ['ArmPeiValidationBits'],
> +            'type': ['ArmProcessorErrorType'],
> +            '*multiple-error': 'uint16',
> +            '*flags': ['ArmProcessorFlags'],
> +            '*error-info': 'uint64',
> +            '*virt-addr':  'uint64',
> +            '*phy-addr': 'uint64'}
> +}
> +
> +##
> +# @ArmProcessorContext:
> +#
> +# Provide processor context state specific to the ARM processor architecture,
> +# According with UEFI 2.10 CPER table N.21.
> +# Argument is optional.If not specified, no context will be used.
                          ^ space
> +#
> +# @type:
> +#       Contains an integer value indicating the type of context state being
> +#       reported.
> +#       Argument is optional. If not defined, it will be set to be EL1 
> register
> +#       for the emulation, e. g.:
> +#       - on arm32: AArch32 EL1 context registers;
> +#       - on arm64: AArch64 EL1 context registers.
> +#
> +# @register:
> +#       Provides the contents of the actual registers or raw data, depending
> +#       on the context type.
> +#       Argument is optional. If not defined, it will fill the first register
> +#       with 0xDEADBEEF, and the other ones with zero.
We could fill this in with a valid snap shot I think?  It' just a set of CPU 
registers.
Obviously content would be pretty random and meaningless given the
error isn't correlated with particular activity (as we triggered it) but maybe 
would
useful for testing the parsing?

Perhaps that's a job for the future as we will want to be able to override it
anyway.

> +#
> +# @minimal-size:
> +#       Argument is optional. If provided, define the minimal size of the
> +#       context register array. The actual size is defined by checking the
> +#       number of register values plus the content of this field (if used),
> +#       ensuring that each processor context information structure array is
> +#       padded with zeros if the size is not a multiple of 16 bytes.
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'ArmProcessorContext',
> +  'data': { '*type': 'uint16',
> +            '*minimal-size': 'uint32',
> +            '*register': ['uint64']}
>  }
>  
>  ##
>  # @arm-inject-error:
>  #
> -# Inject ARM Processor error.
> +# Inject ARM Processor error with data to be filled accordign with UEFI 2.10
> +# CPER table N.16.
>  #
> -# @errortypes: ARM processor error types to inject
> +# @validation:
> +#       Valid validation bits for ARM processor CPER.
> +#       Argument is optional. If not specified, the default is
> +#       calculated based on having the corresponding arguments filled.
> +#
> +# @affinity-level:
> +#       Error affinity level for errors that can be attributed to a specific
> +#       affinity level.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record.
As below.

> +#
> +# @mpidr-el1:
> +#       Processor’s unique ID in the system.
> +#       Argument is optional. If not specified, it will use the cpu mpidr
> +#       field from the emulation data. If zero and @validation is not
> +#       enforced, this field will be marked as invalid at CPER record.
The zero case is obscure enough I'd be tempted to say that if we want
to test that then we will override the validation field.

The logic will end up simpler and still allow the same level of corner
case testing for no valid mpidr (which is really odd if it occurs!)

> +#
> +# @midr-el1:  Identification info of the chip
> +#       Argument is optional. If not specified, it will use the cpu mpidr
> +#       field from the emulation data. If zero and @validation is not
> +#       enforced, this field will be marked as invalid at CPER record.

Same as above.

> +#
> +# @running-state:
> +#       Indicates the running state of the processor.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record.

Fun corners of the spec.  Can't allow bit0 of this and psci-state.
We should perhaps enforce that? I don't think we need to inject completely
invalid states (just corners of what is valid).

> +#
> +# @psci-state:
> +#       Provides PSCI state of the processor, as defined in ARM PSCI 
> document.
> +#       Argument is optional. If not specified, it will use the cpu power
> +#       state field from the emulation data.
Hmm. Do you think validation is meant to cover this? Is it under running-state?

> +#
> +# @context:
> +#       Contains an array of processor context registers.
> +#       Argument is optional. If not specified, no context will be added.
> +#
> +# @vendor-specific:
> +#       Contains a byte array of vendor-specific data.
> +#       Argument is optional. If not specified, no vendor-specific data
> +#       will be added.
> +#
> +# @error:
> +#       Contains an array of ARM processor error information (PEI) sections.
> +#       Argument is optional. If not specified, defaults to a single
> +#       Program Error Information record defaulting to type=cache-error.
>  #
>  # Features:
>  #
> @@ -44,6 +262,16 @@
>  # Since: 9.1
>  ##
>  { 'command': 'arm-inject-error',
> -  'data': { 'errortypes': ['ArmProcessorErrorType'] },
> +  'data': {
> +    '*validation': ['ArmProcessorValidationBits'],
> +    '*affinity-level': 'uint8',
> +    '*mpidr-el1': 'uint64',
> +    '*midr-el1': 'uint64',
> +    '*running-state':  ['ArmProcessorRunningState'],
> +    '*psci-state': 'uint32',
> +    '*context': ['ArmProcessorContext'],
> +    '*vendor-specific': ['uint8'],
> +    '*error': ['ArmProcessorErrorInformation']
> +  },
>    'features': [ 'unstable' ]
>  }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]