qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 7/7] PPC64: Don't fault at lwsync


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH 7/7] PPC64: Don't fault at lwsync
Date: Thu, 05 Mar 2009 17:09:38 +0100
User-agent: Thunderbird 2.0.0.18 (X11/20081112)

Alexander Graf wrote:
> Paul Brook wrote:
>   
>> On Thursday 05 March 2009, Alexander Graf wrote:
>>   
>>     
>>> Right now we can throw a fault on lwsync, even though the fault is
>>> actually caused by the instruction after lwsync.
>>>
>>> I haven't found the magic that messed this up, but for now we can
>>> just end the TB on lwsync, forcing the next command to issue faults
>>> itself.
>>>
>>> If anyone knows how to really fix this, please step forward and do
>>> so. This only makes things work at all for me :-).
>>>     
>>>       
>> Where is the subsequent fault coming from? I suspect the real bug is nothing 
>> to do with lwsync, and the subsequent fault is actually just corrupting the 
>> CPU state. As discussed recently this is the same bug SPARC has with its 
>> unassigned access handlers.
>>
>> Paul
>>   
>>     
>
> Without the patch I get:
>
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc0000000000ba524
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA PowerMac
> Modules linked in:
> Supported: Yes
> NIP: c0000000000ba524 LR: c000000000775a0c CTR: c0000000007759e8
> REGS: c0000000061afb10 TRAP: 0300   Not tainted  (2.6.27.7-9-ppc64)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 84000044  XER: 20000000
> DAR: 0000000000000000, DSISR: 0000000040000000
> TASK = c00000000619d560[1] 'swapper' THREAD: c0000000061ac000 CPU: 0
> GPR00: ffffffffffffffff c0000000061afd90 c0000000009bbce8 0000000000000000
> GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000a82c80
> GPR08: 0000000000000613 c00000000619d560 c0000000070704c0 c0000000061ac000
> GPR12: 0000000088000044 c000000000a82c80 0000000000051b63 0000000000051a41
> GPR16: 0000000000051b5b 000000000004003c 0000000000053958 000000000005345e
> GPR20: 0000000000052fd4 0000000000063dc8 0000000000063db4 00000000fff0245c
> GPR24: 4000000002110000 c0000000007932f8 c000000000b077a8 0000000000000000
> GPR28: c0000000009621f0 c0000000007b01c8 c000000000938f18 c0000000007afce0
> NIP [c0000000000ba524] .cmpxchg_futex_value_locked+0x38/0x78
> LR [c000000000775a0c] .futex_init+0x24/0xac
> Call Trace:
> [c0000000061afd90] [c0000000007759c0] .init_tstats_procfs+0x2c/0x54
> (unreliable)
> [c0000000061afe10] [c00000000000944c] .do_one_initcall+0x78/0x194
> [c0000000061aff00] [c000000000750440] .kernel_init+0xd0/0x148
> [c0000000061aff90] [c00000000002ad84] .kernel_thread+0x4c/0x68
> Instruction dump:
> 39290001 912b0014 7c8407b4 7ca507b4 e92d01b0 e8090520 7fa30040 419d0038
> e92d01b0 e8090520 2ba00003 409d0028 <7c2004ac> 7c001828 7c002000 40c20010
> ---[ end trace 561bb236c800851f ]---
> note: swapper[1] exited with preempt_count 1
> swapper used greatest stack depth: 9296 bytes left
> Kernel panic - not syncing: Attempted to kill init!
>
>
> Which is this translation block:
>
> NIP c0000000000ba524   LR c000000000775a0c CTR c0000000007759e8 XER 20000000
> MSR 8000000000009032 HID0 0000000060000000  HF 8000000000000000 idx 1
> TB 00000000 d8b159bb DECR 0007c417
> GPR00 ffffffffffffffff c0000000061afd90 c0000000009bbce8 0000000000000000
> GPR04 0000000000000000 0000000000000000 0000000000000000 c000000000a82c80
> GPR08 0000000000000613 c00000000619d560 c0000000070704c0 c0000000061ac000
> GPR12 0000000088000044 c000000000a82c80 0000000000051b63 0000000000051a41
> GPR16 0000000000051b5b 000000000004003c 0000000000053958 000000000005345e
> GPR20 0000000000052fd4 0000000000063dc8 0000000000063db4 00000000fff0245c
> GPR24 4000000002110000 c0000000007932f8 c000000000b077a8 0000000000000000
> GPR28 c0000000009621f0 c0000000007b01c8 c000000000938f18 c0000000007afce0
> CR 84000044  [ L  G  -  -  -  -  G  G  ]             RES ffffffffffffffff
> FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> FPSCR 00000000
> SRR0 c000000000774950 SRR1 8000000000009032 SDR1 0000000007c00003
> IN:
> 0xc0000000000ba524:  lwsync
> 0xc0000000000ba528:  lwarx   r0,0,r3
> 0xc0000000000ba52c:  cmpw    r0,r4
> 0xc0000000000ba530:  bne-    0xc0000000000ba540
>
>
> And I seriously have trouble understanding how a data storage exception
> could happen on the lwsync opcode. It looks like R3 became 0 from the
> guest's point of view after lwsync though - hum.
>   

Ah I remember that one now :-). The futex_init function tests if cmpxchg
works with NULL values and that's why R3 is 0. It's actually _supposed_
to fault here. But something gets messed up when the fault happens on
IP=lwsync instead of IP=lwarx and I haven't really researched into why.

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]