qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC


From: BALATON Zoltan
Subject: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
Date: Fri, 21 Feb 2020 19:04:07 +0100 (CET)
User-agent: Alpine 2.22 (BSF 395 2020-01-19)

On Fri, 21 Feb 2020, Peter Maydell wrote:
On Fri, 21 Feb 2020 at 16:05, BALATON Zoltan <address@hidden> wrote:
On Thu, 20 Feb 2020, Richard Henderson wrote:
On 2/18/20 9:10 AM, BALATON Zoltan wrote:
+    DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, true),

I would also prefer a different name here -- perhaps x-no-fp-fi.

What's wrong with hardfloat? That's how the code refers to this so if
anyone searches what it does would turn up some meaningful results.

This prompted me to check what you're using the property for.
The cover letter says:
This patch implements a simple way to keep the inexact flag set for
hardfloat while still allowing to revert to softfloat for workloads
that need more accurate albeit slower emulation. (Set hardfloat
property of CPU, i.e. -cpu name,hardfloat=false for that.)

I think that is the wrong approach. Enabling use of the host
FPU should not affect the accuracy of the emulation, which
should remain bitwise-correct. We should only be using the
host FPU to the extent that we can do that without discarding
accuracy. As far as I'm aware that's how the hardfloat support
for other guest CPUs that use it works.

I don't know of a better approach. Please see section 4.2.2 Floating-Point Status and Control Register on page 124 in this document:

https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0

especially the definition of the FR and FI bits and tell me how can we emulate these accurately and use host FPU. Not using the FPU even when these bits are not needed (which seems to be the case for all workloads we've tested so far) seriously limits the emulation speed so spending time to emulate obscure and unused part of an architecture when not actually needed just to keep emulation accurate but unusably slow does not seem to be the right approach. In an ideal world of course this should be both fast and accurate but we don't seem to have anyone who could achieve that in past two years so maybe we could give up some accuracy now to get usable speed and worry about emulating obscure features when we come across some workload that actually needs it (but we have the option to revert to accurate but slow emulation for that until a better way can be devised that's both fast and accurate). Insisting on accuracy without any solution to current state just hinders making any progress with this.

Other PowerPC emulators also seem to not bother or have similar optimisation. I've quickly checked three that I know about:

https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppcdrc.cpp#L1893
https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppcdrc.cpp#L3503
there's also something here but no mention of FI bit I could notice:
https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppccom.cpp#L2023

https://github.com/xenia-project/xenia/blob/master/src/xenia/cpu/ppc/ppc_hir_builder.cc#L428

https://github.com/dolphin-emu/dolphin/blob/master/Source/Core/Core/PowerPC/Jit64/Jit_FloatingPoint.cpp

But I'm not sure I understand all of the above so hope this makes more sense to someone and can advise.

Regards,
BALATON Zoltan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]