Re: [Qemu-devel] State of ARM FIQ in Qemu

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] State of ARM FIQ in Qemu

From:	Tim Sander
Subject:	Re: [Qemu-devel] State of ARM FIQ in Qemu
Date:	Thu, 13 Nov 2014 17:26:28 +0100
User-agent:	KMail/4.14.2 (Linux/3.16.3; KDE/4.14.2; x86_64; ; )

Am Donnerstag, 13. November 2014, 09:09:33 schrieb Greg Bellows:
> On 13 November 2014 07:58, Tim Sander <address@hidden> wrote:
> > Am Mittwoch, 12. November 2014, 10:00:03 schrieb Greg Bellows:
> > > On 12 November 2014 07:56, Tim Sander <address@hidden> wrote:
> > > > Hi Greg
> > > > 
> > > > > > Bad mode in data abort handler detected
> > > > > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > > > > > Modules linked in: firq(O) ipv6
> > > > > > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G           O 3.14.0
> > 
> > #1
> > 
> > > > > > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > > > > > PC is at 0xffff1240
> > > > > > LR is at handle_fasteoi_irq+0x9c/0x13c
> > > > > > pc : [<ffff1240>]    lr : [<8005cda0>]    psr: 600f01d1
> > > > > > sp : bf363e70  ip : 07a7e79d  fp : 00000000
> > > > > > r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
> > > > > > r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
> > > > > > r3 : 80230d04  r2 : 0000002f  r1 : 00000000  r0 : bf0083c0
> > > > > > Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment
> > > > > > user
> > > > > 
> > > > > It looks like we are in FIQ mode and interrupts have been masked.
> > > > 
> > > > Indeed.
> > > > 
> > > > > > Control: 10c53c7d  Table: 60004059  DAC: 00000015
> > > > > > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > > > > > Stack: (0xbf363e70 to 0xbf364000)
> > > > > > 3e60:                                     bf0083c0 00000000
> > 
> > 0000002f
> > 
> > > > > > 80230d04
> > > > > > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080
> > 
> > 76f92008
> > 
> > > > > > 00000000
> > > > > > 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff
> > 
> > 8005cd04
> > 
> > > > > > 0000002f
> > > > > > 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c
> > 
> > bf363ef8
> > 
> > > > > > 80008528
> > > > > > 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180
> > 
> > 00000000
> > 
> > > > > > 805baa00
> > > > > > 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000
> > 
> > 76e8e4d0
> > 
> > > > > > 80590080
> > > > > > 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744
> > 
> > 200f0113
> > 
> > > > > > ffffffff
> > > > > > 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140
> > 
> > bf0079c0
> > 
> > > > > > 8058cc70
> > > > > > 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008
> > 
> > 00000000
> > 
> > > > > > 80023af0
> > > > > > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528
> > 
> > 00000000
> > 
> > > > > > 76dd3b44
> > > > > > 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000
> > 
> > 76f93428
> > 
> > > > > > 76f93428
> > > > > > 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c
> > 
> > 76f92008
> > 
> > > > > > 00000000
> > > > > > 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff
> > 
> > 9fffd821
> > 
> > > > > > 9fffdc21
> > > > > > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>]
> > > > 
> > > > (gic_eoi_irq+0x0/0x4c)
> > > > 
> > > > > It certainly looks like we are going down the standard IRQ patch as
> > 
> > you
> > 
> > > > > suggested.  I'm not a Linux driver guy, but do you see any kind of
> > > > 
> > > > activity
> > > > 
> > > > > (break points, printfs, ...) through your FIQ handler?
> > > > 
> > > > I am reaching 0xffff1224 which i believe is the fiq vector address on
> > 
> > the
> > 
> > > > vexpress?
> > > 
> > > Hmmm.... not sure.  As you mentioned previously (and as seen in the
> > > above
> > > register dump), I would expect offset 0x1240 (pc=0xffff1240) for an FIQ.
> > > I'm not sure what is at offset 0x1224, but on my Linux kernel it appears
> > > that offset 0x1220 is vector_addrexcptn (not pabort), that happens to
> > > occupy the HYP trap vector.
> > 
> > Zounds! You're right, i think this was a typo in my debug script. Which i
> > didn't notice. But i am even reaching 0x1240 before but not 0x1244 which
> > means
> 
> I wouldn't expect it to reach 0x1244 as that is the word after what I
> believe should be a branch at 0x1240 to the FIQ handler.  This would mean
> we are not overrunning the vector table though.
There is reason that the FIQ entry is the last in the Vector table. Its 
allowed to put code directly at the interrupt vector table which saves one 
jump. Thats what Linux assumes when using arch/arm/kernel/fiq.c set_fiq_handler
does. It takes some binary blob and copies it just at the FIQ handler space.

> > it aborts on the first fiq instructions. Here is the "-d int" output
> > directly
> > after the FIQ hits:
> > Taking exception 3 [Prefetch Abort]
> > ...with IFSR 0x5 IFAR 0x800c8dcc  //kmem_cache_alloc
> > Taking exception 3 [Prefetch Abort]
> > ...with IFSR 0x5 IFAR 0x8001be00 //v7_pabort
> > Taking exception 3 [Prefetch Abort]
> > and then it continue to fail on v7_pabort repeatedly. This shows that
> > there is
> > something fishy going on. It is failing on the presumed handler for the
> > prefetch abort? But as i see earlier resolved prefetched abort errors i
> > can
> > conclude that it works up to the point where the CPU is in FIQ mode.
> > FIQ is special in a way that static mapped memory is needed to avoid a
> > page
> > lookup as this fails under linux in fiq mode. But 0x800c8dcc
> > (kmem_cache_alloc)
> > is not called in the FIQ handler which obviously can't use any Linux
> > infrastructure. And as i do not reach the breakpoint 0xffff1244 these
> > misses
> > happen on the execution of the first address of the FIQ handler.
> 
> Can we check the vector table to see if the FIQ entry is as expected?  It
> appears that the pabort may be in the right place, but it would be good to
> see if the FIQ entry is correct (branching to right place).  I'd expect
> that we should be branching to __fiq_svc?  Maybe setting a breakpoint in
> the first level handler may be useful?
Ok, was digging also into this. I thought i had checked this already but alas 
it seems the values are not the ones i put into set_fiq_handler. So i digged a 
little deeper and tried to find out the values of  vbar, mvbar hvbar.

This is the gcc inline assembly syntax from my kernel module written in c:
asm("mrc p15, 0, %0, c12, c0, 0" : "=r"(vbar) : : "cc");
asm("mrc p15, 0, %0, c12, c0, 1" : "=r"(mvbar) : : "cc"); <- not implemented?
asm("mrc p15, 4, %0, c12, c0, 0" : "=r"(hvbar) : : "cc");  <- not implemented?

It seems as if neither mvbar nor hvbar are implemented and that vbar returns
zero !? I also have a problem with the addresses:
The fiq handler lies at 0xffff1240 but the vectors_page in Linux points to 
0xbfffe000? You where talking about the fact that the security extensions
where not implemented. I was not aware that the different vbar's where
already part of the security stuff?

> > > > > > [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> > > > > > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > > > > > ---[ end trace 3dc3571209a017e1 ]---
> > > > > > Kernel panic - not syncing: Fatal exception in interrupt
> > > > > 
> > > > > It is hard to determine entirely what is happening here based on
> > > > > this
> > > > > info.  I do have code of my own that routes KGDB interrupts as FIQs
> > 
> > and
> > 
> > > > > with the workaround I see the FIQs handled as expected.  Some things
> > 
> > we
> > 
> > > > can
> > > > 
> > > > > try to get more info in hopes of pinpointing where to look:
> > > > >    1. At the top of hw/intc/arm_gic.c there is the following
> > 
> > commented
> > 
> > > > out
> > > > 
> > > > >    line:
> > > > >        //#define DEBUG_GIC
> > > > >    
> > > > >    Uncomment the line, rebuild and rerun.  This will give us some
> > 
> > trace
> > 
> > > > on
> > > > 
> > > > >    what is going through the GIC code.
> > > > 
> > > > I have commented out some debug lines but i see:
> > > > Breakpoint 1, gic_update_with_grouping (s=0x5555564dba80) at
> > > > hw/intc/arm_gic.c:120
> > > > 120                         DPRINTF("Raised pending FIQ %d (cpu
> > > > %d)\n",
> > > > best_irq, cpu);
> > > > 
> > > > With the expected irq nr. 49 (32+17).
> > > > 
> > > > >    2. Run qemu with the "-d int" option which will print a message
> > > > >    on
> > > > 
> > > > each
> > > > 
> > > > >    interrupt.  We should see an FIQ at some point if they are
> > 
> > occurring.
> > 
> > > > The
> > > > 
> > > > > only issue is that there will be numerous IRQs, so you'll have to
> > 
> > parse
> > 
> > > > > through them to find an "exception 6 [FIQ].
> > > > 
> > > > Here is the relevant output when the FIQ hits:
> > > > Taking exception 2 [SVC]
> > > > Taking exception 2 [SVC]
> > > > pml: pml_timer_tick: raise_irq
> > > > arm_gic: Raised pending FIQ 49 (cpu 0)
> > > > Taking exception 6 [FIQ]
> > > 
> > > This looks to me like the GIC has caught the interrupt and communicated
> > 
> > it
> > 
> > > to the CPU causing it to take the FIQ exception.
> > > 
> > > > pml: pml_write: update control flags: 1
> > > > pml: pml_update: start timer
> > > > pml: pml_update: lower irq
> > > > pml: pml_read: read magic
> > > > pml: pml_write: update control flags: 3
> > > > pml: pml_update: start timer
> > > 
> > > Is pml your test driver?  It looks like it initiates the interrupt and
> > > possibly performs some handling following it?
> > 
> > Yes, its just a simple set of some registers to control an interrupt.
> > There is
> > i added debug output to this driver to see if and when the FIQ is
> > accessing
> > the registers. But i see no accesses from FIQ mode.
> > 
> > > > Taking exception 3 [Prefetch Abort]
> > > > ...with IFSR 0x5 IFAR 0x80221d70
> > > > Taking exception 4 [Data Abort]
> > > > ...with DFSR 0x805 DFAR 0x805c604c
> > > > Taking exception 4 [Data Abort]
> > > > ...with DFSR 0x805 DFAR 0x805c604c
> > > > Taking exception 4 [Data Abort]
> > > > 
> > > > So the fiq is hitting but unfortunatly i have no idea where the data
> > > > aborts are coming from.
> > > 
> > > The data aborts are likely a side effect of the prefetch abort taken
> > 
> > before
> > 
> > > them; it is the interesting one.
> > 
> > Still as above the address is odd. In FIQ mode it should not jump to this
> > address at all !?! This is definetly Linux memory space and i am not
> > calling
> > anything linux related from FIQ.
> 
> I'm a bit confused as it appears the exception pattern has changed.
> Previously, we were seeing pabt, dabt, dabt, ..., but then up above the
> output is pabt, pabt, pabt, ... .  So, either we are jumping somewhere
> random thus breaking repeatability or something else changed?  This is also
> reflected in the A15 output below, but its different.
Mh, it seems that the first Prefetch Abort IFAR is random and IFSR is also 
differnt:
First run:
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 3 [Prefetch Abort]
...with IFSR 0x17 IFAR 0x76f6a5e4
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76fcc25c
Taking exception 3 [Prefetch Abort]
...with IFSR 0x17 IFAR 0x76e7f884
Taking exception 3 [Prefetch Abort]
...with IFSR 0x17 IFAR 0x76e7e61c
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76ee5b40
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76fce060
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76fd245c
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76fcdf88

Second run:
pml: pml_timer_tick: raise_irq
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 1 [Undefined Instruction]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 1 [Undefined Instruction]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 2 [SVC]

Third run:
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 2 [SVC]
Taking exception 3 [Prefetch Abort]
...with IFSR 0x17 IFAR 0x76ea8000
Taking exception 2 [SVC]
Taking exception 1 [Undefined Instruction]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 4 [Data Abort]
...with DFSR 0x817 DFAR 0x76d20000
Taking exception 2 [SVC]
Taking exception 2 [SVC]

But then again, i think that the kernel and qemu disagree about the position
of the vector table. So it seems its just accessing random in the irq which 
leads to different results each time. 

So probably will just outline what i do in the kernel to get a fiq from the 
gic:
* i create some static mappings to make sure i have no pagefault in FIQ mode
* Then i reprogramm the gic in a way that all irqs are mapped to group 1 
* one special irq which will be programmed to group 0
* fiq mode is enabled for group 0
It seems as if the gic implementation in qemu knows this dance as i am seeing 
the FIQ happening. But then i have my doubts (due to the missing vbars and the 
different addresses kernel vs. qemu) that the cpu is up to the task in qemu?

The code i am using have been ported from an Altera SOC Cortex A9 and works 
there. While some addresses where hardcoded i *think* that i have meanwhile 
found all wrong addresses in the static mappings which bite me earlier. But 
now i have the impression that i have to dig on the qemu side again.

Best regards
Tim

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] State of ARM FIQ in Qemu, Tim Sander, 2014/11/03
- Re: [Qemu-devel] State of ARM FIQ in Qemu, Greg Bellows, 2014/11/03
  - Re: [Qemu-devel] State of ARM FIQ in Qemu, Tim Sander, 2014/11/03
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Greg Bellows, 2014/11/03
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Tim Sander, 2014/11/04
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Greg Bellows, 2014/11/04
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Tim Sander, 2014/11/12
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Greg Bellows, 2014/11/12
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Tim Sander, 2014/11/13
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Greg Bellows, 2014/11/13
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Tim Sander <=
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Peter Maydell, 2014/11/13
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Greg Bellows, 2014/11/13
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Tim Sander, 2014/11/14
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Greg Bellows, 2014/11/14
    - Re: [Qemu-devel] State of ARM FIQ in Qemu, Tim Sander, 2014/11/17

Prev by Date: Re: [Qemu-devel] [PATCH v2 for-xen-4.5] xen_disk: fix unmapping of persistent grants
Next by Date: Re: [Qemu-devel] [PATCH 1/1] block migration: fix return value mismatch
Previous by thread: Re: [Qemu-devel] State of ARM FIQ in Qemu
Next by thread: Re: [Qemu-devel] State of ARM FIQ in Qemu
Index(es):
- Date
- Thread