qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU
Date: Wed, 27 Apr 2016 15:29:23 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Tue, Apr 26, 2016 at 04:19:00PM +0200, Radim Krčmář wrote:
> 2016-04-26 15:34+0800, Peter Xu:
> > Hi, Jan,
> > 
> > The above issue should be caused by EOI missing of level-triggered
> > interrupts. Before that, I was always using edge-triggered
> > interrupts for test, so didn't encounter this one. Would you please
> > help try below patch? It can be applied directly onto the series,
> > and should solve the issue (it works on my test vm, and I'll take it
> > in v5 as well if it also works for you):
> > 
> > -------------------------
> > 
> > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> > @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned 
> > int size)
> > +/*
> > + * This is to satisfy the hack in Linux kernel. One hack of it is to
> > + * simulate clearing the Remote IRR bit of IOAPIC entry using the
> > + * following:
> > + *
> > + * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
> > + * Otherwise, we simulate the EOI message manually by changing the trigger
> > + * mode to edge and then back to level, with RTE being masked during
> > + * this."
> > + *
> > + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
> > + *
> > + * This is based on the assumption that, Remote IRR bit will be
> > + * cleared by IOAPIC hardware for edge-triggered interrupts (I
> > + * believe that's what the IOAPIC version 0x1X hardware does).
> 
> I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies
> on EOI broadcast from LAPIC -- does that change with IR?

IIUC, ioapic_ack_level() should be the one to handle EOI when IR is
disabled. And, the EOI broadcast should be happening at:

        ack_APIC_irq();

While, after that, we can see some more lines:

        /*
         * Tail end of clearing remote IRR bit (either by delivering the EOI
         * message via io-apic EOI register write or simulating it using
         * mask+edge followed by unnask+level logic) manually when the
         * level triggered interrupt is seen as the edge triggered interrupt
         * at the cpu.
         */
        if (!(v & (1 << (i & 0x1f)))) {
                atomic_inc(&irq_mis_count);
                eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
        }

What I understand the above is that: first of all, we will do EOI
broadcast. However, if we found that one level-triggered interrupt
is treated as edge-triggered interrupt (that is exactly what I have
encountered below), we will do one more explicit EOI in
eoi_ioapic_pin(), in which we played the edge-mask/level-unmask
trick for IOAPIC with version 0x1X.

For IR enabled case, we just do both without checking (see
ioapic_ir_ack_level()).

So that's why I think this should not happen if either way
works... Or say, if without this patch, both "EOI broadcast" and
"explicit EOI (hacky version)" are not working for IR case. And I am
still looking for the reason for previous one (this patch fix the
latter one).

> 
> > + *                                                             So
> > + * if we are emulating it, we'd better do it the same here, so that
> > + * the guest kernel hack will work as well on QEMU.
> 
> Totally.
> 
> > + * Without this, level-triggered interrupts in IR mode might fail to
> > + * work correctly.
> 
> (I don't really understand why it worked before.)

Yes, actually what I want to try is to have one IOMMU hardware
machine, plug e1000 (I mean real hardware) into it, and see whether
current Linux kernel IOMMU driver can cope well with level-triggered
devices (I suppose this scenario is rarely used, since
level-triggered interrupts are most legacy IIUC).

> 
> > + */
> > +static inline void
> > +ioapic_fix_edge_remote_irr(uint64_t *entry)
> > +{
> > +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
> > +        /* Level triggered interrupts, make sure remote IRR is zero */
> > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> 
> (You can just unconditionally zero it, edge doesn't care.)

Ah! I made a mistake. I suppose what I really want is:

+    if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) {
+        /* Edge-triggered interrupts, make sure remote IRR is zero */
+        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
+    }

Though both should help do the trick, I should be using this new
one in v5.

> 
> > +    }
> > +}
> > +
> > @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t 
> > val,
> >                      s->ioredtbl[index] &= ~0xffffffffULL;
> >                      s->ioredtbl[index] |= val;
> >                  }
> > +                ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
> 
> I think this can be done only in the else branch of (s->ioregsel & 1).

Yes. I can move it there, but there will be hidden assumption (or
say, truth...) that these magic bits are inside entry bits 31-0, and
people might be confused if we do not know that.  IMHO, for better
readability of code, I would still prefer to put it here (it means
"we need to make sure the entry satisfy some kind of rule, but we do
not need to know further about what the rule is"). If you still
insist, I'd like to take your advice though. :)

> 
> (If the guest kernel does level->edge->level, then remote_irr probably
>  should be cleared only on edge->level transition and not on
>  level->level, but I haven't seen that in the spec ...)

Agree. That's what my above diff is trying to fix. Thanks to point out.

> 
> >                  ioapic_service(s);
> > ------------------------
> > 
> > I am still looking into guest part codes. Although the above patch
> > should solve the issue, there are still issues in guest codes when
> > IR is enabled:
> > 
> > - mismatched "vector" in IOAPIC entry and IRTE entry (this is
> >   required in vt-d spec 5.1.5.1, and required to correctly deliver
> >   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():
> 
> "required" is a way of saying that the opposite is undefined.
> No need to think about it in IOMMU.

Why? Without correct vector information, IOAPIC will not be able to
know which entry to clear the Remote IRR bit (please check
ioapic_eoi_broadcast())?

> 
> > - I encountered that level-triggered entries in IOAPIC is marked as
> >   edge-triggered interrupt in APIC (which is strange)...
> 
> What/where do you mean?
> (The only difference I know of is that level triggered vectors in LAPIC
>  have their respective TMR bit set while edge do not.)

Exactly. Here is what I mean:

static void apic_eoi(APICCommonState *s)
{
    int isrv;
    isrv = get_highest_priority_int(s->isr);
    if (isrv < 0)
        return;
    apic_reset_bit(s->isr, isrv);
    if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && apic_get_bit(s->tmr, isrv)) 
{
        ioapic_eoi_broadcast(isrv);
    }
    apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
    apic_update_irq(s);
}

APIC will notify IOAPIC only if the corresponding vector in TMR bit
is set (in "apic_get_bit(s->tmr, isrv)", or say, it's a
level-triggered interrupt in APIC registers). What I have traced is
that, the EOI broadcast is missing because this bit is cleared in
APIC TMR while it should be set. I need some more tests to double
confirm this though, in case I made any mistake.

(P.S. Actually I saw some similiar comments in kernel codes around,
please check the long comments in ioapic_ack_level().  Not sure
whether these are related.)

Thanks!

-- peterx



reply via email to

[Prev in Thread] Current Thread [Next in Thread]