qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64


From: Frediano Ziglio
Subject: Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
Date: Fri, 9 Jan 2015 16:08:56 +0000

2015-01-09 15:52 GMT+00:00 Peter Maydell <address@hidden>:
> On 9 January 2015 at 15:41, Frediano Ziglio <address@hidden> wrote:
>> 2015-01-09 12:22 GMT+00:00 Peter Maydell <address@hidden>:
>>>> +/* Optimised x64 version. This assume that a*b/c fits in 64 bit */
>>>
>>> This assumption isn't necessarily true, and this implementation
>>> will dump core on overflow.
>
>> Yes, I know, it was meant to be a precondition, not a math rule. Doing
>> some grep I'm not really sure this is valid in all cases however I
>> would ask if the truncation is handled correctly. Surely in some cases
>> the call to this function is not needed like in "muldiv64(1, tks,
>> usb_bit_time);" or in "muldiv64((int16_t) s->rtc.comp, 1000, 0x8000);"
>
> Sure, there are some cases where we know the calculation won't overflow,
> but there are also cases where we can't be so sure, and if the parameters
> can be controlled by the guest then we absolutely can't allow the guest
> to cause QEMU to SIGFPE.
>
> Incidentally, in the examples you quote gcc is generally able
> (because the function is inline) to do a better job because of
> some of the constants involved: for instance with
>   "muldiv64((int16_t) s->rtc.comp, 1000, 0x8000);"
> it generates a completely inline insn sequence with one mul and
> no div insns.
>
> So I think at this point I come back to an earlier question:
> do you have profiling results showing this function to be
> a performance problem on a real workload? If not, it just doesn't
> seem to me to be worth the risk and the maintenance burden to
> add a native x86 assembly version.
>
> -- PMM

I agree (after some digging) we are not sure we won't get that
overflow. Agree to drop the second patch. However I would retain the
first. Compiler can use it to optimize much easier. For instance if
compiler understand that the multiplication fits into a 64 bit can
decide to avoid the 128 bit operation easily, not so easy with all
these shift, multiply, division and union structure.

I didn't do any profiling.

Frediano



reply via email to

[Prev in Thread] Current Thread [Next in Thread]