bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recent patches break ACPI tables


From: luca
Subject: Re: Recent patches break ACPI tables
Date: Mon, 19 Jun 2023 21:40:20 +0200

Il 19/06/23 20:35, Almudena Garcia ha scritto:
But the code which starts the secondary cpus is so much later than the crash.

Then, the crash could be produced by the reading of ACPI tables, which are 
supposed to be in a certain memory region, defined by a physical address.

phystokv will doesn't solve fully the problem, because the lapic address is out 
of the range allowed by this function. Currently, we are using paging to map 
every ACPI table which we need to access (to get a virtual address of this).

But the search of the initial ACPI address is based in a physical address range.

I could go a bit further with debugging, and it seems that the problem is a bit different, it seems removing the 1:1 map exposed an issue that went hidden so far.

In my test the cpu is reset by a triple fault (you can see this by enabling interrupt and cpu_reset logging with qemu, e.g. using -d int,cpu_reset) which is triggered after the first call to splvm:

(gdb) bt
#0  splvm () at ../i386/i386/spl.S:122
#1 0xc1001da6 in pmap_enter (pmap=<optimized out>, v=<optimized out>, pa=<optimized out>, prot=<optimized out>, wired=<optimized out>) at ../i386/intel/pmap.c:2171 #2 0xc1029b99 in pmap_steal_memory (size=<optimized out>) at ../vm/vm_resident.c:278 #3 0xc1029c48 in vm_page_bootstrap (startp=<optimized out>, endp=<optimized out>) at ../vm/vm_resident.c:207
#4  0xc101b893 in vm_mem_bootstrap () at ../vm/vm_init.c:65
#5  0xc10161d1 in setup_main () at ../kern/startup.c:115
#6 0xc1004652 in c_boot_entry (bi=<optimized out>) at ../i386/i386at/model_dep.c:578
#7  0xc1000093 in iplt_done () at ../i386/i386at/boothdr.S:103
(gdb) si
124             cli
1: x/i $pc
=> 0xc100ac5d <splvm+5>:       cli
(gdb)
125             CPU_NUMBER(%edx)
1: x/i $pc
=> 0xc100ac5e <splvm+6>:       mov    %cs:0xc109bc6c,%edx
(gdb)
0xc100ac65      125             CPU_NUMBER(%edx)
1: x/i $pc
=> 0xc100ac65 <splvm+13>:      mov    %cs:0x20(%edx),%edx
(gdb)
t_page_fault () at ../i386/i386/locore.S:435
435             pushl   $(T_PAGE_FAULT)         /* mark a page fault trap */
1: x/i $pc
=> 0xc100a42c <t_page_fault>:  push   $0xe

... and here it will enter recursively t_page_fault, because in trap_from_kernel there is another CPU_NUMBER. I guess the triple fault is triggered because at some point the exception stack overflows.

With --enable-ncpu=2 it seems that CPU_NUMBER is

#define CPU_NUMBER(reg) \
        movl    %cs:lapic, reg          ;\
        movl    %cs:APIC_ID(reg), reg   ;\
        shrl    $24, reg                ;\

and at this stage the lapic pointer is not yet initialized:

(gdb) p lapic
$4 = (volatile ApicLocalUnit *) 0x0
(gdb) x &lapic
0xc109bc6c <lapic>:       0x00000000

I guess so far this worked because the address 0 was mapped, and now it isn't.

I'm not sure what would be the proper way to solve this. I tried anticipating the call to machine_init() to be before vm_mem_bootstrap() (to have lapic initialized) but this triggers another assert.

Any idea?


Luca




reply via email to

[Prev in Thread] Current Thread [Next in Thread]