qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 5/5] i8259: fix dynamically masking slave IRQ


From: Jan Kiszka
Subject: Re: [Qemu-devel] [PATCH v4 5/5] i8259: fix dynamically masking slave IRQs with IMR register
Date: Mon, 03 Sep 2012 10:51:38 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2012-09-03 04:56, Matthew Ogilvie wrote:
> Although I haven't found any specs that say so, on real hardware
> I have a test program that shows if you mask off the slave
> interrupt (say IRQ14) in the IMR after it has already been raised,
> the master (IRQ2) gets canceled (that is, IRQ2 acts like it is level
> triggered).  Without this patch, qemu delivers a
> spurious interrupt (IRQ15) instead when running the test program.
> 
> Signed-off-by: Matthew Ogilvie <address@hidden>
> ---
> 
> I've written a test program (in the form of a floppy disk boot sector)
> that demonstrates that qemu's emulation of IRQ2 propagation from the
> slave i8259 to the master does not work correctly when the CPU has
> interrupts disabled and it masks off the original interrupt (IRQ14)
> in the slave's IMR register.  This was based on simplifying breakage
> observed when trying to run an old Microport UNIX system (ca 1987).
> 
> Earlier I speculated that maybe the ELCR bit for IRQ2 was incorrect,
> but now I don't think changing the bit (from the target's
> perspective) would be a good idea.  See below.

Yes, you cannot set IRQ0..2 to level triggered mode on modern chipsets,
look e.g. at the ICH9 spec:

"In edge mode, (bit[x] = 0), the interrupt is recognized by a low to
high transition. In level mode (bit[x] = 1), the interrupt is recognized
by a high level. The cascade channel, IRQ2, the heart beat timer (IRQ0),
and the keyboard controller (IRQ1), cannot be put into level mode."

So, fiddling with ELCR from guest perspective cannot be the solution.

> 
> You can download the source code for the test program from
> http://home.comcast.net/~mmogilvi/downloads/i8259-imr-test-2012-09-02.tar.bz2
> It can be compiled using a standard GNU i386 toolchain on Linux.

As Paolo said, we are better off with a KVM unit test. That should be
loadable via grub & co. on real HW as well.

> 
> The heart of the test program is:
> 

Initialization is also important to exclude leftovers of the BIOS.

>         cli
> 
>           # i8259:imm: mask off everything except IRQ2
>         movb $0xfb,%al     # master (only IRQ2 clear)
>         outb %al,$0x21
>         movb $0xff,%al     # slave
>         outb %al,$0xa1
> 
>         mov $.msgCmdRead,%ax
>         call print
>         call initIrqHandlers
>         call scheduleIrq14
> 
>         call .largeDelay   # note: IRQ14 raised while this is waiting
> 
>         mov $.msgUnmask,%ax
>         call print
>         movb $0x3f,%al     # unmask IRQ14 and IRQ15
>         outb %al,$0xa1
> 
>         call .largeDelay   # (probably not important)
> 
>         mov $.msgMask,%ax
>         call print
>         movb $0xff,%al     # mask IRQ14 and IRQ15 again
>         outb %al,$0xa1

And now check IRR of the master by reading it back. That would be a
smoking gun if bit 2 is set before and cleared afterward.

> 
>         call .largeDelay   # (probably not important)
> 
>         mov $.msgSti,%ax
>         call print
>         sti
> 
>         call .largeDelay   # note: spurious interrupt (IRQ15) from qemu
> 
>         cli
>         mov $.msgUnmask,%ax
>         call print
>         movb $0x3f,%al     # unmask IRQ14 and IRQ15
>         outb %al,$0xa1
>         sti
> 
>         call .largeDelay   # we should finally see IRQ14 here?
> 
>           # DONE:
>         cli
>         movw $.msgDone, %ax
>         call print
>   .Lhalt:
>         hlt
>         jmp .Lhalt
> 
> In qemu, the master treats IRQ2 as edge triggered, and delivers a
> spurious interrupt if the CPU re-enables interrupts after
> the slave's IMR is masked off (it also doesn't seem to deliver
> the real interrupt when the IMR is unmasked, but maybe that is
> because I'm not correctly acknowledging the spurious interrupt).

Yes, that is required.

> 
>   - Qemu output (without this patch):
>         elcr=0c00 cmdRead ummask mask sti irq15 unmask DONE
> 
> But on real hardware, the master seems to treat IRQ2 as level triggered,
> and doesn't deliver an interrupt to the CPU until the slave unmasks IRQ14.
> I've tried this on several machines, including a non-PCI 486 that
> does not seem to have ELCR registers.
> 
>   - Typical output (pentium/pentium pro/dual AMD Athlon/etc; slight
>     variations in some elcr bits depending on machine, but bit 0x04
>     is always clear):
>         elcr=0c20 cmdRead unmask mask sti unmask irq14 DONE
> 
>   - 486 without elcr (just an ISA bus):
>         elcr=e0e0 cmdRead unmask mask sti unmask irq14 DONE
> 
>   - One failure: It doesn't boot properly (no output) with a USB floppy
>     drive on my Intel Core I7.  Guess: The test program just barely
>     fits in a sector, with no room for any tables (partition/etc)
>     that BIOS might check for if it isn't an original, native floppy
>     drive.
> 
> -----------------------
> 
> I've found a few descriptions of programming the i8259.

I was mostly following the official "8259A PROGRAMMABLE INTERRUPT
CONTROLLER (8259A 8259A-2)" by Intel. But it also doesn't mention any
trigger mode changes for the cascading input lines.

> The closest thing I've found to a detailed spec is in
> "iAPX 86, 88 User's Manual", dated August 1981:
> http://ebookbrowse.com/1981-iapx-86-88-users-manual-pdf-d3089962
> It has a significant section titled "Using the 8259A Programmable
> Interrupt Controller" starting on page 438 or A-135.
> 
> But none of my sources seem to specify how master/slave cascading
> interacts with the IMR (mask register) and edge trigger mode,
> although it talks about those things individually.
> Also, given the date it isn't surprising that it doesn't mention
> the elcr registers at all.
> 
> I have not found any real specs for the ELCR registers, but I've found a
> few hints:

Intel chipset docs describe it, though only briefly.

> 
>   - Two 8 bit registers: one for master (0x4d0) and one for slave (0x4d1).
>   - One bit per IRQ line: 0 for edge trigger, 1 for level trigger.
>   - Not present unless the machine has EISA or PCI buses.  Plain
>     ISA doesn't have it.
>   - Might be implemented completely external to the actual i8259s.
>   - As seen in test program output above, ELCR bit 0x04 is clear,
>     indicating that IRQ2 is supposedly edge triggered, even though
>     actual tested behavior is level triggered.
>   - A google search (8259 elcr -linux -qemu) [exclude the
>     dominant Linux and qemu related pages] found at least one operating
>     system that checks several ELCR bits (including IRQ2) as part
>     of a probe to determine if ELCR exists.  So simply setting
>     the 0x04 bit is probably a bad idea.  For example, line 119 of:
>     https://bitbucket.org/npe/nix/src/35d39df17077/src/9kron2/k8/i8259.c
> 
> -----------------------
> 
> My guess is that if a master IRQ line is marked as coming from a slave
> (in the ICW3 register), then the i8259 treats that line as level
> triggered regardless of ELCR registers or the LTIM bit of ICW1 (in
> addition to other special slave line logic).  Otherwise, the slave
> IMR register would seem rather useless.

That IMR is surely not useless, even if we do not automatically revoke
the IRQ2 event. It would just require more careful unmasking in that case.

> 
> Under that theory, something like the following patch would be appropriate
> for qemu.  This fixes both the test program, and the original Microport
> UNIX problem.
> 
> Possible concerns with this patch:
> 
>   - KVM still needs to be fixed.  I haven't researched this at all,
>     beyond noticing that it has the same broken behavior running the
>     test program, and the high level symptoms from Microport UNIX
>     are slightly different with KVM.
>   - It might also be a good idea to add something like my test
>     program to qemu/tests somewhere.  This would be a separate patch.
>   - Best icw3 configuration strategy?: Should it be stored, or
>     just assume it is correctly configured based on
>     PICCommonState::master and standard PC IRQ2 configuration?

Let's avoid adding more logic that assumes cascading IRQ == 2.

>     Perhaps add sanity checks for reasonable values when the
>     guest is configuring the 8259s?  Did I catch all the places

No need for sanity checks if the logic is generic enough.

>     that need to deal with a new state variable (assuming we
>     decide to store it)?
>   - This patch is adding code to what may be a relatively common
>     code path (every time interrupts raised/lowered/acknowledged, masks
>     changed, etc).  Potentially it could be optimized by adding an
>     "effective_elcr" state variable, that is maintined as a combination
>     of icw3 and elcr (and maybe add support for LTIM bit from ICW1 at the
>     same time), so that we don't need to have extra code in the
>     critical path.  Does anyone think this is worth it?  I'm leaning
>     towards "no".

No need for such a micro-optimization.

> 
> ==========================================
> 
>  hw/i8259.c          | 12 ++++++++----
>  hw/i8259_common.c   |  2 ++
>  hw/i8259_internal.h |  1 +
>  3 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/i8259.c b/hw/i8259.c
> index 6587666..8e2f9f4 100644
> --- a/hw/i8259.c
> +++ b/hw/i8259.c
> @@ -140,7 +140,7 @@ static void pic_set_irq(void *opaque, int irq, int level)
>      }
>  #endif
>  
> -    if (s->elcr & mask) {
> +    if ((s->elcr | s->icw3) & mask) {
>          /* level triggered */
>          if (level) {
>              s->irr |= mask;
> @@ -174,7 +174,7 @@ static void pic_intack(PICCommonState *s, int irq)
>          s->isr |= (1 << irq);
>      }
>      /* We don't clear a level sensitive interrupt here */
> -    if (!(s->elcr & (1 << irq))) {
> +    if (!((s->elcr | s->icw3) & (1 << irq))) {
>          s->irr &= ~(1 << irq);
>      }
>      pic_update_irq(s);
> @@ -314,6 +314,10 @@ static void pic_ioport_write(void *opaque, 
> target_phys_addr_t addr64,
>              s->init_state = s->single_mode ? (s->init4 ? 3 : 0) : 2;
>              break;
>          case 2:
> +            s->icw3 = val;
> +            /* TODO: Enforce constraints?: Master is typically
> +             *   0x04 for IRQ2 (maybe 0 is also OK).  Slave must be 0.
> +             */

No constraints, see above (and slave must be 2, its ID == cascading IRQ#).

>              if (s->init4) {
>                  s->init_state = 3;
>              } else {
> @@ -419,9 +423,9 @@ void pic_info(Monitor *mon)
>      for (i = 0; i < 2; i++) {
>          s = i == 0 ? DO_UPCAST(PICCommonState, dev.qdev, isa_pic) : 
> slave_pic;
>          monitor_printf(mon, "pic%d: irr=%02x imr=%02x isr=%02x hprio=%d "
> -                       "irq_base=%02x rr_sel=%d elcr=%02x fnm=%d\n",
> +                       "irq_base=%02x icw3=%02x rr_sel=%d elcr=%02x 
> fnm=%d\n",
>                         i, s->irr, s->imr, s->isr, s->priority_add,
> -                       s->irq_base, s->read_reg_select, s->elcr,
> +                       s->irq_base, s->icw3, s->read_reg_select, s->elcr,
>                         s->special_fully_nested_mode);
>      }
>  }
> diff --git a/hw/i8259_common.c b/hw/i8259_common.c
> index ab3d98b..dcde5f2 100644
> --- a/hw/i8259_common.c
> +++ b/hw/i8259_common.c
> @@ -33,6 +33,7 @@ void pic_reset_common(PICCommonState *s)
>      s->isr = 0;
>      s->priority_add = 0;
>      s->irq_base = 0;
> +    s->icw3 = 0;
>      s->read_reg_select = 0;
>      s->poll = 0;
>      s->special_mask = 0;
> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = {
>          VMSTATE_UINT8(isr, PICCommonState),
>          VMSTATE_UINT8(priority_add, PICCommonState),
>          VMSTATE_UINT8(irq_base, PICCommonState),
> +        VMSTATE_UINT8(icw3, PICCommonState),

Let's add this as an optional subsection, only written when it's not
0x04 (for a master) or 0x2 (for a slave). See target-i386/machine.c for
examples, or read docs/migration.txt. That will mean you need to set
icw3 to 0x4 before loading a vmstate (=> pre_load handler).

The effect is that, generally, no additional state will be written in
practice - unless we save right after reset or some guest messes its PIC
configuration up.

>          VMSTATE_UINT8(read_reg_select, PICCommonState),
>          VMSTATE_UINT8(poll, PICCommonState),
>          VMSTATE_UINT8(special_mask, PICCommonState),
> diff --git a/hw/i8259_internal.h b/hw/i8259_internal.h
> index 4137b61..de67d49 100644
> --- a/hw/i8259_internal.h
> +++ b/hw/i8259_internal.h
> @@ -55,6 +55,7 @@ struct PICCommonState {
>      uint8_t isr; /* interrupt service register */
>      uint8_t priority_add; /* highest irq priority */
>      uint8_t irq_base;
> +    uint8_t icw3; /* bit set if line is connected to a slave */

Comment is wrong. This word is used for masters and slaves.

>      uint8_t read_reg_select;
>      uint8_t poll;
>      uint8_t special_mask;
> 

The only thing that worries me is that we just consider the PC so far
while the i8259 is also used on different architectures (PPC, MIPS, Alpha?).

Jan

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]