qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack optio


From: Matthew Ogilvie
Subject: [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option
Date: Thu, 23 Aug 2012 00:24:43 -0600

This patch provides a way to optionally suppress spurious interrupts,
as a workaround for systems described below:

Some old operating systems do not handle spurious interrupts well,
and qemu tends to generate them significantly more often than
real hardware.

Examples:
  - Microport UNIX System V/386 v 2.1 (ca 1987)
    (The main problem I'm fixing: Without this patch, it panics
    sporadically when accessing the hard disk.)
  - AT&T UNIX System V/386 Release 4.0 Version 2.1a (ca 1991)
    See screenshot in "QEMU Official OS Support List":
    http://www.claunia.com/qemu/objectManager.php?sClass=application&iId=9
    (I don't have this system to test.)
  - A report about OS/2 boot lockup from 2004 by Hampa Hug:
    http://lists.nongnu.org/archive/html/qemu-devel/2004-09/msg00367.html
    (My patch was partially inspired by his.)
    Also: http://lists.nongnu.org/archive/html/qemu-devel/2005-06/msg00243.html
    (I don't have this system to test.)

Signed-off-by: Matthew Ogilvie <address@hidden>
---

Note: checkpatches.pl gives an error about initializing the global 
"int no_spurious_interrupt_hack = 0;", even though existing lines
near it are doing the same thing.  Should I give precedence to
checkpatches.pl, or nearby code?

There was no version 1 of this patch; this was the last thing I had to
work around to get UNIX running.

High level symptoms:
   1. Despite using this UNIX system for nearly 10 years (ca 1987-1996)
      on an early 80386, I don't remember ever seeing any crash like
      this.  I vaguely remember I may have had one or two crashes for
      which I don't have other explanations that perhaps could have
      been this, but I don't remember the error messages to confirm it.
   2. It is somewhat random when UNIX crashes when running in qemu.
       - Sometimes it crashes the first time the floppy-based installer
         tries to access the hard disk (partition table?).
       - Other times (though fairly rarely), it actually finishes
         formatting and copying the first disk's files to the
         hard disk without crashing.
       - On the other hand, I've never seen it successfully boot from
         the hard disk without this patch.  An attempt to boot from
         the hard drive always panics quite early.
   3. I tried -win2k-hack instead, thinking maybe the hard disk is just
      responding faster than UNIX expected.  But it doesn't seem
      to have any effect.  UNIX still panics sporadically the same way.
       - TANGENT: I was going to see if my patch provides an
         alternative fix for installing Windows 2000, but
         I was unable to reproduce the original -win2k-hack problem at
         all (with neither -win2k-hack NOR this patch).  Maybe
         some other change has fixed it some other way?  Or maybe
         it is only an issue in configurations I didn't test?
         (KVM instead of TCG?  Less RAM?  Something else?)
            It might be worth doing a little more investigation,
         and eliminating the -win2k-hack option if appropriate.
   4. If I enable KVM, I get a different error very early in
      bootup (in splx function instead of splint), and this patch
      doesn't help.

============
My low level analysis of what is going on:

It is hard to track down all the details, but based on logging a
lot of qemu IRQ stuff, and setting a breakpoint in the earliest
panic-related UNIX function using gdb, it looks like:

   1. It is near the end of servicing a previous IRQ14 from the
      hard disk.
   2. The processor has interrupts disabled (I think), while UNIX
      clears the slave 8259's IMR (mask) register (sets it to 0), allowing
      all interrupts to be passed on to the master.
   3. While in that state, IRQ14 is raised (on the slave), which
      gets propagated to the master (IRQ2), but the CPU
      is not interrupted yet.
   4. UNIX then masks the slave 8259's IMR register
      completely (sets to 0xff).
   5. Because the master elcr register is set (by BIOS; UNIX never
      touches it) to edge trigger for IRQ2, the master latched on
      to IRQ2 earlier, and continues to assert the processors INT line
      (the env->interrupt_request&CPU_INTERRUPT_HARD bit) even
      after all slave IRQs have been masked off (clearing the input
      IRQ2).
   6. Finally, UNIX enables CPU interrupts and the interrupt is delivered
      to the CPU, which ends up as a spurious IRQ15 due to the
      slave's imr register.  UNIX doesn't know what to do with
      that, and panics/halts.

I'm not sure why it only sporadically hits this sequence of events.
There doesn't seem to be other IRQs asserted or serviced anywhere
in the near past; the last several were all IRQ14's.  But I can't
help feeling I'm not reading the log output correctly or something,
because that doesn't make sense.  Maybe there is there some kind
of a-few-instructions delay before a CPU interrupt is actually
deliviered after interrupts are enabled, or some delay in raising
IRQ14 after a hard drive operation is requested, and such delays
need to fall into a narrow window of opportunity left by UNIX?

I can get a disassembly of the UNIX kernel using a "coff"-enabled
build of GNU objdump, giving function names but not much else.
But I haven't studied it in enough detail to actually find the
relevant code path that is manipulating imr as described above.
However, this old post outlines some of the high level theory
of UNIX spl*() functions:
http://www.linuxmisc.com/29-unix-internals/4e6c1f6fa2e41670.htm

If anyone wants to look into this further, I can provide access to the
initial boot install floppy, at least.  Email me.  (Without the rest
of the install disks, it isn't much use for anything except testing
virtual machines like qemu against rare corner cases...)

============
Alternative Approaches:

An alternative to this patch that might work (I haven't tried) would
be to have BIOS set the master's elcr register 0x04 bit, making IRQ2
level triggered instead of edge triggered.  I'm not sure what other
effects this might have.  Maybe it would actually be a more accurate
model (I haven't checked documentation; maybe "slave mode" of a
IRQ line into the master is supposed to be level triggered?)

Or perhaps find a way to model the minimum timescale that a interrupt
request needs to be active to be recognized?

Or maybe my analysis isn't correct; I wasn't able to find the
relevant code path in the UNIX kernel.

============

 cpu-exec.c      | 12 +++++++-----
 hw/i8259.c      | 18 ++++++++++++++++++
 qemu-options.hx | 12 ++++++++++++
 sysemu.h        |  1 +
 vl.c            |  4 ++++
 5 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 134b3c4..c309847 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -329,11 +329,15 @@ int cpu_exec(CPUArchState *env)
                                                           0);
                             env->interrupt_request &= ~(CPU_INTERRUPT_HARD | 
CPU_INTERRUPT_VIRQ);
                             intno = cpu_get_pic_interrupt(env);
-                            qemu_log_mask(CPU_LOG_TB_IN_ASM, "Servicing 
hardware INT=0x%02x\n", intno);
-                            do_interrupt_x86_hardirq(env, intno, 1);
-                            /* ensure that no TB jump will be modified as
-                               the program flow was changed */
-                            next_tb = 0;
+                            if (intno >= 0) {
+                                qemu_log_mask(CPU_LOG_TB_IN_ASM,
+                                              "Servicing hardware 
INT=0x%02x\n",
+                                              intno);
+                                do_interrupt_x86_hardirq(env, intno, 1);
+                                /* ensure that no TB jump will be modified as
+                                   the program flow was changed */
+                                next_tb = 0;
+                            }
 #if !defined(CONFIG_USER_ONLY)
                         } else if ((interrupt_request & CPU_INTERRUPT_VIRQ) &&
                                    (env->eflags & IF_MASK) && 
diff --git a/hw/i8259.c b/hw/i8259.c
index 6587666..7ecb7e1 100644
--- a/hw/i8259.c
+++ b/hw/i8259.c
@@ -26,6 +26,7 @@
 #include "isa.h"
 #include "monitor.h"
 #include "qemu-timer.h"
+#include "sysemu.h"
 #include "i8259_internal.h"
 
 /* debug PIC */
@@ -193,6 +194,20 @@ int pic_read_irq(DeviceState *d)
                 pic_intack(slave_pic, irq2);
             } else {
                 /* spurious IRQ on slave controller */
+                if (no_spurious_interrupt_hack) {
+                    /* Pretend it was delivered and acknowledged.  If
+                     * it was spurious due to slave_pic->imr, then
+                     * as soon as the mask is cleared, the slave will
+                     * re-trigger IRQ2 on the master.  If it is spurious for
+                     * some other reason, make sure we don't keep trying
+                     * to half-process the same spurious interrupt over
+                     * and over again.
+                     */
+                    s->irr &= ~(1<<irq);
+                    s->last_irr &= ~(1<<irq);
+                    s->isr &= ~(1<<irq);
+                    return -1;
+                }
                 irq2 = 7;
             }
             intno = slave_pic->irq_base + irq2;
@@ -202,6 +217,9 @@ int pic_read_irq(DeviceState *d)
         pic_intack(s, irq);
     } else {
         /* spurious IRQ on host controller */
+        if (no_spurious_interrupt_hack) {
+            return -1;
+        }
         irq = 7;
         intno = s->irq_base + irq;
     }
diff --git a/qemu-options.hx b/qemu-options.hx
index 03e13ec..57bb0b4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1188,6 +1188,18 @@ Windows 2000 is installed, you no longer need this 
option (this option
 slows down the IDE transfers).
 ETEXI
 
+DEF("no-spurious-interrupt-hack", 0, QEMU_OPTION_no_spurious_interrupt_hack,
+    "-no-spurious-interrupt-hack     disable delivery of spurious 
interrupts\n",
+    QEMU_ARCH_I386)
+STEXI
address@hidden -no-spurious-interrupt-hack
address@hidden -no-spurious-interrupt-hack
+Use it as a workaround for operating systems that drive PICs in a way that
+can generate spurious interrupts, but the OS doesn't handle spurious
+interrupts gracefully.  (e.g. late 80s/early 90s versions of ATT UNIX
+and derivatives)
+ETEXI
+
 HXCOMM Deprecated by -rtc
 DEF("rtc-td-hack", 0, QEMU_OPTION_rtc_td_hack, "", QEMU_ARCH_I386)
 
diff --git a/sysemu.h b/sysemu.h
index 65552ac..0170109 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -117,6 +117,7 @@ extern int graphic_depth;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
+extern int no_spurious_interrupt_hack;
 extern int alt_grab;
 extern int ctrl_grab;
 extern int usb_enabled;
diff --git a/vl.c b/vl.c
index 16d04a2..6de41c1 100644
--- a/vl.c
+++ b/vl.c
@@ -204,6 +204,7 @@ CharDriverState *serial_hds[MAX_SERIAL_PORTS];
 CharDriverState *parallel_hds[MAX_PARALLEL_PORTS];
 CharDriverState *virtcon_hds[MAX_VIRTIO_CONSOLES];
 int win2k_install_hack = 0;
+int no_spurious_interrupt_hack = 0;
 int usb_enabled = 0;
 int singlestep = 0;
 int smp_cpus = 1;
@@ -3046,6 +3047,9 @@ int main(int argc, char **argv, char **envp)
             case QEMU_OPTION_win2k_hack:
                 win2k_install_hack = 1;
                 break;
+            case QEMU_OPTION_no_spurious_interrupt_hack:
+                no_spurious_interrupt_hack = 1;
+                break;
             case QEMU_OPTION_rtc_td_hack: {
                 static GlobalProperty slew_lost_ticks[] = {
                     {
-- 
1.7.10.2.484.gcd07cc5




reply via email to

[Prev in Thread] Current Thread [Next in Thread]