Re: mac99 SMP

qemu-ppc
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mac99 SMP

From:	Andrew Randrianasulu
Subject:	Re: mac99 SMP
Date:	Thu, 6 Mar 2025 18:57:35 +0300
On Thu, Mar 6, 2025 at 6:41 PM BALATON Zoltan <balaton@eik.bme.hu> wrote:
>
> On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:
> > чт, 6 мар. 2025 г., 18:16 BALATON Zoltan <balaton@eik.bme.hu>:
> >> On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:
> >>> On Thu, Mar 6, 2025 at 4:12 PM BALATON Zoltan <balaton@eik.bme.hu>
> >> wrote:
> >>>>
> >>>> On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:
> >>>>> чт, 6 мар. 2025 г., 05:10 BALATON Zoltan <balaton@eik.bme.hu>:
> >>>>>
> >>>>>> On Thu, 6 Mar 2025, Andrew Randrianasulu wrote:
> >>>>>>> On Thu, Mar 6, 2025 at 2:02 AM Andrew Randrianasulu
> >>>>>>> <randrianasulu@gmail.com> wrote:
> >>>>>>>> On Thu, Mar 6, 2025 at 12:21 AM BALATON Zoltan <balaton@eik.bme.hu>
> >>>>>> wrote:
> >>>>>>>>> So is that the ISI that I saw? Line 308 is end of DSI handler but
> >> log
> >>>>>> name
> >>>>>>>>> shows ISI handler. But you had no ISI logs with -d int so I don't
> >> get
> >>>>>> it.
> >>>>>>>>> What are the registers of that CPU at that point? One of those
> >> should
> >>>>>> tell
> >>>>>>>>> from where it got to the ISI handler but backtrace does not show
> >> that.
> >>>>>>>>> (Check CPU docs which reg has the address that caused the
> >> exception, I
> >>>>>>>>> don't remember.)
> >>>>>>>>>
> >>>>>>>>>> this was case when I set sstep bits to 0x1 - it does not bring up
> >>>>>> second
> >>>>>>>>>> cpu,  AND does not single step into irq_save function in
> >>>>>> core99_kick_cpu :(
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> So I assumed  least impactful mode (0x1) is not very useful for
> >>>>>> detailed
> >>>>>>>>>> single stepping into this specific function.
> >>>>>>>>>>
> >>>>>>>>>> But 0x3, 0x5 and default 0x7 all works, as far as single stepping
> >> and
> >>>>>>>>>> bringing up secondary cpu are concerned.
> >>>>>>>>>
> >>>>>>>>> You were stepping throgh CPU0 but the interesting part is what
> >> CPU1 is
> >>>>>>>>> doing so maybe try to trace that:
> >>>>>>>>>
> >>>>>>>>> thread 2
> >>>>>>>>> b *0x100
> >>>>>>>>
> >>>>>>>> It advances to 0x400 (second cpu) but no further than this
> >>>>>>
> >>>>>> 0x400 is the ISI vector so it seems it hits that for some reason
> >> which is
> >>>>>> the same I saw with -d int,mmu and probably it shouldn't get those
> >> before
> >>>>>> it copies MMU setup from CPU0.
> >>>>>>
> >>>>>>>> see gdb log
> >>>>>>>>
> >>>>>>>> Note that I used
> >>>>>>>>
> >>>>>>>> maintenance packet Qqemu.sstep=0x1
> >>>>>>>> sending: Qqemu.sstep=0x1
> >>>>>>>> received: "OK"
> >>>>>>>>
> >>>>>>>> and ctrl-c first thread when it failed to single step (single
> >> stepping
> >>>>>>>> on thread 2/CPU1 was already impossible)
> >>>>>>>>
> >>>>>>>> It show different state but no obvious function pointers :/
> >>>>>>>
> >>>>>>> Aw, this one was without second qemu in mttcg mode :/
> >>>>>>>
> >>>>>>> New gdb log attached
> >>>>>>
> >>>>>> As much as I understand it shows CPU0 waiting for CPU1 to set it's
> >>>>>> call_in_map entry (that's OK and expected) while CPU1 is getting ISIs
> >> or
> >>>>>> some other exceptions (which it likely shouldn't get) but I still
> >> don't
> >>>>>> see how far CPU1 got in its init code and what triggers these ISIs?
> >> When
> >>>>>> in ISI handler there's a register that has the fault address which is
> >>>>>> where it jumped there from. I told you to check PPC manual for that. I
> >>>>>> looked it up now it's SRR0 ("Set to the effective address of the
> >>>>>> instruction that the processor would have attempted to execute next
> >> if no
> >>>>>> exception conditions were present (if the exception occurs on
> >> attempting
> >>>>>> to fetch a branch target, SRR0 is set to the branch target address)").
> >>>>>> What code that address belongs to? That's what causing the ISIs and we
> >>>>>> should find out why.
> >>>>>>
> >>>>>
> >>>>> Thanks for looking it up
> >>>>>
> >>>>> At very first moment when 0x100 breakpoint hit and gdb autoswitches to
> >>>>> thread 2 it shows empty r0-r32 and
> >>>>>
> >>>>> srr0           0x100               256
> >>>>>
> >>>>> at next "step" (not really step because single stepping starts run away
> >>>>> execution with sstep bits set to 0x1)
> >>>>>
> >>>>> srr0           0xc000439c          -1073724516
> >>>>>
> >>>>> same as pc (program counter)
> >>>>>
> >>>>> pc             0xc000439c          0xc000439c <InstructionAccess_virt>
> >>>>>
> >>>>> so it already in its bad state? Not sure how to get any in-between
> >> state?
> >>>>>
> >>>>> May be enable normal ssbits (0x7) just after cpu1 is hit its breakpoint
> >>>>> and then single step ?
> >>>>
> >>>> You can step by assembly instruction when theres's no source or line
> >>>> number info with gdb "stepi" command, with that you should be able to
> >> step
> >>>> through assembly code.
> >>>
> >>> Thanks! I replaced  thread 1 / step with stepi on thread 2
> >>>
> >>> it ended up with
> >>>
> >>> * 2    Thread 1.2 (CPU#1 [running]) 0xc0006d30 in
> >> vmap_stack_overflow_virt () at
> >>> arch/powerpc/kernel/head_book3s_32.S:375
> >>> ++thread 2
> >>> [Switching to thread 2 (Thread 1.2)]
> >>> #0  0xc0006d30 in vmap_stack_overflow_virt () at
> >> arch/powerpc/kernel/head_book3s
> >>> _32.S:375
> >>> 375             b       interrupt_return
> >>> ++backtrace
> >>> #0  0xc0006d30 in vmap_stack_overflow_virt () at
> >> arch/powerpc/kernel/head_book3s
> >>> _32.S:375
> >>> #1  0x00000000 in ?? ()
> >>
> >> This does not make much sense because
> >> arch/powerpc/kernel/head_book3s_32.S:375 is end of FPU unavailable
> >> exception (which probably should not happen) and has nothing to do with
> >> vmap_stack_overflow_virt() (which I don't know where is as it's not
> >> present in the older Linux sources I was looking at). In any case it looks
> >> like the problem is that unexpected exceptions are happening that causes
> >> CPU1 to interrupt its code execution and jump to uninitialised or wrong
> >> vectors and prevent it to init correctly. I don't know if this is because
> >> on real machine this would run from cache and won't cause exceptions or
> >> something is not correctly emulated so I don't know how to fix. I remember
> >> a similar problem with MorphOS which worked on real machine but caused
> >> problem in QEMU but could be prevented by turning off the MSR DR IR bits
> >> until the exception vectors were correctly set up. But in that case
> >> OpenBIOS enabled these bits and MorphOS did not disable before trying to
> >> change exception vectors but here for second CPU I think it should start
> >> after a reset with these bits disabled so the question is what enables
> >> them in Linux and at that point is it ready to get exceptions?
> >
> >
> > well, cpu0 starts ok at least ...
> >
> >
> > can we just add disassembly command to gdb script?
>
> You can try
>
> display x/i $pc

++display x/i $pc
my-qemu-gdb-script.gdb:27: Error in sourced command file:
No symbol "x" in current context.



>
> or something like that to get a disassembly for each prompt I think.
>
> Regards,
> BALATON Zoltan
[Prev in Thread]
Current Thread
[Next in Thread]
Re: mac99 SMP, (continued)
Prev by Date: Re: mac99 SMP
Next by Date: Re: mac99 SMP
Previous by thread: Re: mac99 SMP
Next by thread: Re: mac99 SMP
Index(es):
- Date
- Thread