qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: NMI handling


From: Blue Swirl
Subject: [Qemu-devel] Re: NMI handling
Date: Mon, 26 Jul 2010 17:57:56 +0000

On Mon, Jul 26, 2010 at 4:53 PM, Artyom Tarasenko
<address@hidden> wrote:
> 2010/6/21 Artyom Tarasenko <address@hidden>:
>> 2010/5/25 Blue Swirl <address@hidden>:
>>>>> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU.
>>>>
>>>> What does indicate it? It happens where the disk sizes are normally
>>>> reported, so it could be a scsi/dma/irq/fpu issue as well.
>>>
>>> IIRC the DVMA address was 0xfc004000, but the mapped entries were for
>>> 0xfc000000 to 0xfc003fff.
>
> Under OpenBIOS. And even less with OBP, and much less if the network
> card is disabled.
>
>> It looks like we have multiple problems here: they start with
>> 0xfc004000 access (which can theoretically be expected on the real
>> hardware too) as you pointed out, but what happens afterwards is
>> strange too:
>>
>> - In the current qemu implementation we have a screaming NMI which
>> NetBSD can not clear. This happens cause NMI in qemu is literally
>> non-maskable, while on the real hardware it can be masked with the
>> 'mask all' flag. I'll send a patch for it.
>>
>> - with the masking patch, the NMI is not screaming but still is
>> percepted as spurious. This may be ok if NetBSD (1.6-3.1) doesn't have
>> a moduleerr_handler set.
>
> Or because scsi dma transfer on a real hardware never generates a nmi.
>
> In the current implementation, when "select with attention" is
> processed, scsi controller initiates a dma transfer and fetches a CDB.
> If dma fails (not mapped, or not allowed), NMI is generated. It is
> quite a strange design: such an error is an asynchronous event, and
> CPU wouldn't know, that scsi controller tried to do some dma at
> certain address. It would have been more consequent to send the error
> notification to the dma initiator (scsi controller in this case),  not
> to CPU.
>
> The offending code in NetBSD 1.6-3.1:
>
> NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA); // Here it crashes (under
> qemu) cause dma page is not valid
> NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize); // The
> page would have been made valid here.
> NCRDMA_GO(sc);
>
> In the working versions (before 1.6 and after 4.0) the code looks like this:
>
> NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
> //...
> NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
> NCRDMA_GO(sc);
>
> After debugging the code on the real hardware, it looks like qemu has
> multiple problems in scsi/dma/iommu layer.
>
> I modified NCRDMA_SETUP, so that it did dma transfer without mapping
> the page. In this case NetBSD 3.1 shows the following error (on a real
> SS-20):
>
> dma0: error: csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED>
> esp0: DMA error; resetting
> dma0: error: csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED>
>
> no NMI.
>
> And what is more important, on the real hardware "select with
> attention" does not initiate dma (put a delay, waited 2 seconds and
> nothing happened). It has to be done manually.
>
> Any suggestions how to fix it according to the current iommu/dma
> architecture? Looks like "select with attention" should register
> callbacks?  ( Volunteers? ;-) )

Excellent analysis!

About NMI: IOMMU just raises the qemu_irq provided by sun4m.c. The
interrupt bit number is currently 30, which is Module Error
(asynchronous fault). Maybe this should be 29, MSI (MBus-SBus
Interface) interrupt? That is still NMI though. Could you check what
interrupt bits get active in the interrupt controller master status?
What is in IOMMU AFSR?

About select with attention: NCRDMA_GO just tweaks DMA controller, so
ESP shouldn't perform the transfer if DMA is not ready. I think Linux
always pre-programs DMA.

One way to handle this would be to add a qemu_irq signal from DMA to
ESP which tells ESP whether DMA is ready. DMA raises or lowers the
interrupt whenever DMA is enabled or disabled. When the IRQ is
received by ESP, If there is no transfer pending, it just adjusts an
internal flag about DMA status. If there is a transfer pending, it is
started. When ESP handles a command, it should check the internal DMA
flag. If DMA is ready, continue with the transfer immediately like
now. Otherwise, hold the transfer and store parameters to internal
state. I wonder what state bits ESP will show when this happens.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]