qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: NMI handling


From: Artyom Tarasenko
Subject: [Qemu-devel] Re: NMI handling
Date: Mon, 26 Jul 2010 18:53:54 +0200

2010/6/21 Artyom Tarasenko <address@hidden>:
> 2010/5/25 Blue Swirl <address@hidden>:
>>>> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU.
>>>
>>> What does indicate it? It happens where the disk sizes are normally
>>> reported, so it could be a scsi/dma/irq/fpu issue as well.
>>
>> IIRC the DVMA address was 0xfc004000, but the mapped entries were for
>> 0xfc000000 to 0xfc003fff.

Under OpenBIOS. And even less with OBP, and much less if the network
card is disabled.

> It looks like we have multiple problems here: they start with
> 0xfc004000 access (which can theoretically be expected on the real
> hardware too) as you pointed out, but what happens afterwards is
> strange too:
>
> - In the current qemu implementation we have a screaming NMI which
> NetBSD can not clear. This happens cause NMI in qemu is literally
> non-maskable, while on the real hardware it can be masked with the
> 'mask all' flag. I'll send a patch for it.
>
> - with the masking patch, the NMI is not screaming but still is
> percepted as spurious. This may be ok if NetBSD (1.6-3.1) doesn't have
> a moduleerr_handler set.

Or because scsi dma transfer on a real hardware never generates a nmi.

In the current implementation, when "select with attention" is
processed, scsi controller initiates a dma transfer and fetches a CDB.
If dma fails (not mapped, or not allowed), NMI is generated. It is
quite a strange design: such an error is an asynchronous event, and
CPU wouldn't know, that scsi controller tried to do some dma at
certain address. It would have been more consequent to send the error
notification to the dma initiator (scsi controller in this case),  not
to CPU.

The offending code in NetBSD 1.6-3.1:

NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA); // Here it crashes (under
qemu) cause dma page is not valid
NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize); // The
page would have been made valid here.
NCRDMA_GO(sc);

In the working versions (before 1.6 and after 4.0) the code looks like this:

NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
//...
NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
NCRDMA_GO(sc);

After debugging the code on the real hardware, it looks like qemu has
multiple problems in scsi/dma/iommu layer.

I modified NCRDMA_SETUP, so that it did dma transfer without mapping
the page. In this case NetBSD 3.1 shows the following error (on a real
SS-20):

dma0: error: csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED>
esp0: DMA error; resetting
dma0: error: csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED>

no NMI.

And what is more important, on the real hardware "select with
attention" does not initiate dma (put a delay, waited 2 seconds and
nothing happened). It has to be done manually.

Any suggestions how to fix it according to the current iommu/dma
architecture? Looks like "select with attention" should register
callbacks?  ( Volunteers? ;-) )


-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]