[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection vi
From: |
Igor Mammedov |
Subject: |
Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI |
Date: |
Thu, 8 Aug 2024 14:45:14 +0200 |
On Thu, 8 Aug 2024 14:11:14 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Tue, 6 Aug 2024 16:31:13 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > > + /* Could also be read back from the error_block_address register */
> > > + *error_block_addr = base +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > > +
> > > + return true;
> > > +}
> >
> > I don't like all this pointer math, which is basically a reverse engineered
> > QEMU actions on startup + guest provided etc/hardware_errors address.
> >
> > For once, it assumes error_source_to_index[] matches order in which HEST
> > error sources were described, which is fragile.
> >
> > 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
> > in RAM migrated from older version might not match above assumptions
> > of target QEMU.
> >
> > I see 2 ways to rectify it:
> > 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of
> > HEST table
> > in guest RAM, like we do with etc/hardware_errors, see
> > build_ghes_error_table()
> > ...
> > tell firmware to write hardware_errors GPA into
> > and then fetch from HEST table in RAM, the guest patched error/ack
> > addresses
> > for given source_id
> >
> > code-wise: relatively simple once one wraps their own head over
> > how this whole APEI thing works in QEMU
> > workflow is described in docs/specs/acpi_hest_ghes.rst
> > look to me as sufficient to grasp it.
> > (but my view is very biased given my prior knowledge,
> > aka: docs/comments/examples wrt acpi patching are good
> > enough)
> > (if it's not clear how to do it, ask me for pointers)
>
> That sounds a better approach, however...
>
> > 2nd: sort of hack based on build_ghes_v2() Error Status Address/Read Ack
> > Register
> > patching instructions
> > bios_linker_loader_add_pointer(linker,
> > ACPI_BUILD_TABLE_FILE,
> > address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
> >
> > ACPI_GHES_ERRORS_FW_CFG_FILE, source_id *
> > sizeof(uint64_t));
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > during build_ghes_v2() also store on a side mapping
> > source_id -> error address offset : read ack address
> >
> > so when you are injecting error, you'd at least use offsets
> > used at start time, to get rid of risk where injection code
> > diverge from HEST:etc/hardware_errors layout at start time.
> >
> > However to make migration safe, one would need to add a fat
> > comment not to change order ghest error sources in HEST _and_
> > a dedicated unit test to make sure we catch it when that happens.
> > bios_tables_test should be able to catch the change, but it won't
> > say what's wrong, hence a test case that explicitly checks order
> > and loudly & clear complains when we will break order assumptions.
> >
> > downside:
> > * we are are limiting ways HEST could be composed/reshuffled in
> > future
> > * consumption of extra CI resources
> > * and well, it relies on above duct tape holding all pieces
> > together
>
> I ended opting to do approach (2) on this changeset, as the current code
> is already using bios_linker_loader_add_pointer() for ghes, being deeply
> relying on the block address/ack and cper calculus.
I consider (2) as a fallback in case (1) can't be done with reasonable effort.
At this point, (1) looks doable and I'm not convinced that duct tape
is necessary and that we badly need to rush in this series.
Hence I'd strongly prefer (1).
See my other reply to Jonathan, setting write pointer is not hard.
Parsing HEST doesn't have to be a full tables parser as long as
it respects table types/length/revision then we can cheat by using
documented offsets from ACPI spec as is, for fields we
need to access.
> To avoid troubles on this duct tape, I opted to move all offset math
> to a single function at ghes.c:
>
> /*
> * ID numbers used to fill HEST source ID field
> */
> enum AcpiHestSourceId {
> ACPI_HEST_SRC_ID_SEA,
> ACPI_HEST_SRC_ID_GED,
>
> /* Shall be the last one */
> ACPI_HEST_SRC_ID_COUNT
> } AcpiHestSourceId;
>
> ...
>
> static bool acpi_hest_address_offset(enum AcpiGhesNotifyType notify,
> uint64_t *error_block_offset,
> uint64_t *ack_offset,
> uint64_t *cper_offset,
> enum AcpiHestSourceId *source_id)
> {
> enum AcpiHestSourceId source;
> uint64_t offset;
>
> switch (notify) {
> case ACPI_GHES_NOTIFY_SEA: /* Only on ARMv8 */
> source = ACPI_HEST_SRC_ID_SEA;
> break;
> case ACPI_GHES_NOTIFY_GPIO:
> source = ACPI_HEST_SRC_ID_GED;
> break;
> default:
> return true;
> }
>
> if (source_id) {
> *source_id = source;
> }
>
> /*
> * Please see docs/specs/acpi_hest_ghes.rst for the memory layout.
> * In summary, memory starts with error addresses, then acks and
> * finally CPER blocks.
> */
>
> offset = source * sizeof(uint64_t);
>
> if (error_block_offset) {
> *error_block_offset = offset;
> }
> if (ack_offset) {
> *ack_offset = offset + ACPI_HEST_SRC_ID_COUNT *
> sizeof(uint64_t);
> }
> if (cper_offset) {
> *cper_offset = 2 * ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t) +
> source * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> }
>
> return false;
> }
>
> I also removed the anonymous enum with SEA/GPIO source IDs, using
> only the ACPI notify type as arguments at the function calls.
>
> As there's now a single point where the offsets from
> docs/specs/acpi_hest_ghes.rst are enforced, this should be error
> prone.
>
> The code could later be changed to use approach (2), on a separate
> cleanup.
>
> Thanks,
> Mauro
>
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI, (continued)
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI, Igor Mammedov, 2024/08/07
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI, Jonathan Cameron, 2024/08/07
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI, Igor Mammedov, 2024/08/08
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI, Mauro Carvalho Chehab, 2024/08/08
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI, Igor Mammedov, 2024/08/12
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI, Mauro Carvalho Chehab, 2024/08/13
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI, Mauro Carvalho Chehab, 2024/08/08
- Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI,
Igor Mammedov <=
[PATCH v5 4/7] acpi/ghes: Support GPIO error source, Mauro Carvalho Chehab, 2024/08/02
Re: [PATCH v5 4/7] acpi/ghes: Support GPIO error source, Igor Mammedov, 2024/08/06
[PATCH v5 1/7] arm/virt: place power button pin number on a define, Mauro Carvalho Chehab, 2024/08/02
[PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES, Mauro Carvalho Chehab, 2024/08/02