qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] PreP kernels boot using Qemu


From: J. Mayer
Subject: Re: [Qemu-devel] PreP kernels boot using Qemu
Date: Wed, 24 Oct 2007 01:06:07 +0200

On Tue, 2007-10-23 at 23:59 +0200, Aurelien Jarno wrote:
> J. Mayer a écrit :
> > On Tue, 2007-10-23 at 12:47 +0100, Thiemo Seufer wrote:
> >> J. Mayer wrote:
> >>> On Tue, 2007-10-23 at 00:05 +0200, Aurelien Jarno wrote:
> >>>> J. Mayer a écrit :
> >>>>> On Mon, 2007-10-22 at 18:28 +0200, Aurelien Jarno wrote:
> >>>>>> On Mon, Oct 22, 2007 at 09:36:07AM +0200, J. Mayer wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I've been investigating more about PreP kernel boot using Qemu and I
> >>>>>>> achieved to boot 2.4.35, 2.6.12 and 2.6.22 kernels using Qemu CVS and
> >>>>>>> unmodified OHW.
> >>> [...]
> >>>>>> - The "floating point" problem I reported during the week-end does
> >>>> not
> >>>>>>   exists, probably because of the switch from powerpc to ppc. I
> >>>> still 
> >>>>>>   don't know if it is a kernel problem or a QEMU problem (or both).
> >>>>> There may be issues with the floating point emulation, especially if
> >>>>> some kernel or programs relies on the FPSCR (floating-point status)
> >>>>> register which is never updated in Qemu.
> >>>>>
> >>>> Is there any technical reason behind that, or is it just a lack of
> >>>> time?
> >>> I can say  both:
> >>> for most program, using floating point arithmetic ala "fast-math", it's
> >>> not necessary to maintain a precise FPU state, as those program will
> >>> never raise any FPU exception, never generate NaNs, infinites, ...
> >>> The other reason is that it would need to check every FPU insn arguments
> >>> and results at run time and treat all special cases following the actual
> >>> PowerPC implementations behavior if we want to get a precise emulation.
> >>> This behavior could be for example selected at compile time: then one
> >>> would have the choice to have a quick FPU emulation model or a precise
> >>> one.
> >> For mips I chose the middle ground: The emulation is architecturally
> >> correct but may not reflect FPU behaviour of the specific silicon.
> >> E.g. one effect is that in certain cases the emulation computes values
> >> close to underflow, while real hardware would throw the (mips FPU
> >> specific) unimplemented exception.
> >>
> >> For most cases this should be good enough, since only specialized
> >> software will rely on a specific implementation's oddities.
> > 
> > Well, what you've done for Mips is exactly what I called the "precise
> > emulation" and is far slower than the "fast math" emulation I got for
> > PowerPC. I was wrong talking about "PowerPC implementations" when I
> > should have said "PowerPC specification"; but there should be no
> > difference between the two (or it's not a PowerPC CPU...) because the
> > POWER/PowerPC specification describes very precisely the behavior of the
> > FPU.
> > The "fast math" model relies on the native-softmmu code and is suficient
> > for most applications. But there are a few instructions that should
> > always take care (or maybe at least reset) the FPSCR register, which is
> > not done in the current code.
> > 
> 
> Then I guess it is what has been done on the SPARC target: after each FP
> instruction, check_ieee_exceptions() is called to accumulate the IEEE
> exceptions and generate real exceptions if they are enabled.
> 
> That doesn't look really complex, but I agree that could slow down a bit
> the emulation. I will get a closer look in two or three weeks.

It's not so complex. What would greatly slow down the emulation is that
you need to use the softfloat model instead of the softfloat-native one
for this to produce the expected result.
The PowerPC "fadd" instruction just compiles with 3 insns on amd64,
using the "fast math" model:
movlpd 0x1b8(%r14),%xmm0 ; /* Load env->ft0 into a MMX register */
addsd  0x1b0(%r14),%xmm0 ; /* Add env->ft1 */
movsd  %xmm0,0x1b0(%r14) ; /* Store the result into env->ft0 */
With the "precise" model, you need to:
1/ Clear the floating point flags
2/ Load operands from env->ft0 & env->ft1 into host registers
3/ Call the float64_add function
4/ Store the result into env->ft0
5/ Compute the architecture specific FPU flags
which will lead to execute much more code for each FPU operation and
will consume much more space in the TB buffer.

It's a good idea to allow the use of such a precise model, when you want
to use specific applications that rely on the FPU to properly handle
NaNs, infinities and properly generate exceptions. But, as it's not
needed by most applications, having a "fast math" model is also great to
have a quicker emulation. I said it would be great to allow the choice
of the model at compile time but it could in fact be choosen at
run-time, just tweaking the code translator (which should not lead to
any performance penalty for the "fast" model case) and compiling twice
the FPU micro-operations, once with the CONFIG_SOFTFLOAT defined, once
without. This way, the Qemu user could easily choose between "fast" or
"precise" models, just changing a switch on the command line.

-- 
J. Mayer <address@hidden>
Never organized





reply via email to

[Prev in Thread] Current Thread [Next in Thread]