qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Windows 98 installer


From: Michael Karcher
Subject: [Qemu-devel] Windows 98 installer
Date: Mon, 24 Oct 2011 20:21:38 +0200

Hello developers,

there are plenty of reports in the internet that the Windows 98
installer crashes or hangs in qemu. I took the effort to track down what
causes these problems, and I think I found out the core reason, which
seems to be a bug in the Microsoft DOS Extender DOSX.

The Windows 95/Windows 98 installers are Windows 3.1 applications, and
the setup media contain the Windows 3.1 kernel for "standard mode", i.e.
the 286 mode of Windows 3.1. The lowest layer of Windows 3.1 running in
standard mode is the Microsoft DOS extender, which amongst other things
provides a DPMI host implementation and does interrupt management. The
crashes of the Windows 98 installer I could observe were caused by
overflowing the number of interrupt stacks inside DOSX, which can happen
if interrupts are generated faster than they are handled.

The code path is like this:

While DOSX is active and executing real-mode code with interrupts
enabled, an interrupt occurs (e.g. the timer interrupt). All real mode
interrupt handlers are hooked by dosx, so control is transferred to the
corresponding interrupt handler in dosx. The handler for interrupts
occurring in real mode reflects the interrupt to protected mode. The
reflection to protected mode happens on one of the internal interrupts
stacks inside DOSX. After setting up the interrupt stack and looking up
the protected mode handler, an interrupt return frame for the protected
mode handler is set up containing the flag register value that was
active when the real-mode handler in DOSX was entered (i.e. the return
flags from the DOSX handler are copied to the interrupt stack).

The protected mode interrupt handler in SYSTEM.DRV then at some time
decides to chain to the original protected mode interrupt handler inside
DOSX, either by jumping to the handler re-using the return frame (and
thus the return flags the DOSX handler will see are the same as the code
that reflected the interrupt to protected mode had seen), or on another
code path that has the same net effect [skipped as it does not matter
for the issue here].

So now DOSX is entered again. The default protected mode interrupt
handler then decides to reflect the interrupt to real mode - to all the
code that hooked the interrupt before DOSX was called. Just as for the
reflect-to-protected-mode code, also the reflect-to-real-mode code
allocates an interrupt stack from the stacks inside DOSX, switches to
that stack, and finally calls the original handler (this time in real
mode), with the return frame having the same flags as the return frame
of the reflection handler.

Long story short: So the flags from when the hardware interrupt handler
was entered were passed along into the return frame the reflecting
handler builds for the protected mode handler. The flags from this
return frame are then passed into the return frame of the second
reflecting handler builds for the real mode handler. As interrupts were
enabled at the start of that chain (otherwise, it would not have
started), we know that the interrupt flag is set in the return frame of
the real-mode handler. Also, note that two interrupt stacks got
allocated during this process. (the total number of interrupt stacks is
12 by default, which is not overwritten in the system.ini provided with
the Windows 98 installer)

Now let's assume for some reason the real-mode handler of the timer
interrupt takes more than 55ms to execute (or execution is scheduled
from qemu to another process so that not 55ms of real CPU time is
available between two timer ticks), then the next timer tick is pending
as soon as the real-mode handler of the timer interrupt returns into the
reflect-to-real-mode handler (which is going switch back to protected
mode and return to either SYSTEM.DRV or the reflect-to-protected mode
handler and freeing the interrupt stack used for reflection to real
mode). BUT as we know, the interrupt flag is set in the interrupt return
frame for the real-mode handler - which causes qemu to accept the next
timer interrupt directly after the real mode handler returned, with two
interrupt stacks still allocated.

If the nesting level gets to six, all interrupt stack frames are used.
DOSX still allocates further stack frames, resulting in the stack
pointer pointing into the data segment of DOSX, damaging important data
structures, which will crash the system some time later.

If you know the 8086 architecture by heart, and also know the qemu code,
you could get the idea that there might be an emulation bug causing the
premature acceptance of the second interrupt (would it be accepted after
cleaning up the stack frames, there would be no problem), namely that
after an IRET or STI instruction, interrupts are only accepted after one
further instruction - and only if they are still enabled. So *if* the
real-mode handler returned to an CLI instruction, a real 8086 compatible
CPU would not accept an interrupt between the IRET and CLI. Indeed, the
DOSX code contains an CLI instruction in the code that tears down the
allocated interrupt stack after the real mode handler returned, but it
is not the first, but the third instruction - which is too late even on
real hardware. Tp be exact, the code at the return point of the real
mode handler inside DOSX is "pop ds / pushf / cli".

I don't have any solution for that problem at hand, and I can't say for
sure that this nesting of timer interrupts really is a problem if you
don't trace qemu with "-d in_asm,cpu" (not tracing should make it
faster), but the kind of crashes I saw with and without tracing were
similar, so I expect interrupt stack overflow to cause the crashes
observed in the Windows 98 installer.

The main reason I am writing this mail is to archive the knowledge about
what happens inside the installer, so this tedious tracing process
doesn't have to be reproduced by somebody else interested in fixing the
problem, but I am happy to hear suggestions on how this problem can be
fixed or worked around.

Regards,
  Michael Karcher

Attachment: signature.asc
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]