fluid-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Re: [fluid-dev] Floats and doubles, simd and interpolation


From: Stefan Kost
Subject: Re: Fwd: Re: [fluid-dev] Floats and doubles, simd and interpolation
Date: Tue, 14 Dec 2010 11:54:51 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Lightning/1.0b2 Thunderbird/3.1.7

Hi,

On 14.12.2010 10:19, David Henningsson wrote:
> Did this message reach the mailinglist? I can't find it in the
> archives, so I'm resending it.

I did not got it too. Thanks for resending.
>
> Since then, I tried making an hand-optimized SSE version of
> fluid_rvoice_buffers_mix, but it performed worse than ORC's version...
>
> // David
>
> -------- Original Message --------
> Subject: Re: [fluid-dev] Floats and doubles, simd and interpolation
> Date: Sun, 28 Nov 2010 07:11:04 +0100
> From: David Henningsson <address@hidden>
> To: fluid-dev <address@hidden>
>
> On 2010-11-22 09:17, David Henningsson wrote:
>> So the reason I like floats is that with SSE, you can process 4 floats
>> simultaneously, but only 2 doubles. From running a perf I know that 2/3
>> of the time (for my testcase) was spent in the interpolation routine.
>> Can we SIMD:ize that, we might get 3-4x speed improvement, that's at
>> least what I hope for.
>>
>> There is a library called "ORC", anybody heard of it? You write some
>> pseudo-assembly code, and on first run ORC translates it into SSE, MMX,
>> Altivec, etc, or plain old C depending on your hardware. I think it
>> sounds interesting, and was hoping to see if I could make a test soon,
>> but then I got busy trying to find that bug instead.
>
> So a follow-up on this. I have the same testcase as stated earlier
> (FluidR3 sf2 and Dont_you_worry_about_a_thin.mid).
>
> Rendering with doubles takes ~12.3 s, rendering with floats takes ~11.9
> s, that's on a 64 bit Ubuntu Maverick (one core, -z 4096). According to
> perf, here's where we spend the most time:
>
>     41.47%  fluid_rvoice_dsp_interpolate_4th_order
>     21.17%  fluid_iir_filter_apply
>     10.05%  fluid_rvoice_buffers_mix
>      8.18%  fluid_revmodel_processmix
>      5.40%  fluid_chorus_processmix
>      2.75%  fluid_rvoice_write
>
> So since fluid_rvoice_buffers_mix was the simplest one to optimize, I
> tried to make an ORC version. After having downloaded the latest version
> of ORC from Debian Experimental (the one coming with Ubuntu Maverick was
> buggy), I ended up with ~11.1 seconds and fluid_rvoice_buffers_mix (or
> rather a strange orc function) being 5% of the total instead of 10%. I
> also spent some time looking at the iir_filter_apply and interpolation
> functions.
>
> So experiences from this experiment:
>
>  - ORC is still immature, and does not seem to be able to handle more
> complex things like iir_filter_apply and 4th interpolation yet.

Yes, indeed orc is a quite young project. Luckily it is developing
quickly. If you have troubles with implementing an IIR filter in orc but
can do it in plain SSE, it would be worth to file a bug (or just send an
email to david schleef). It is not too difficult to add more opcodes to
orc. He is a video codec guy, so the input for what is needed for audio
needs to come from audio people :)

>
>  - I was expecting more improvement from ORC - SSE should be able to
> process 4 floats at once, so the time should have decreased with a
> factor of 3-4 rather than a factor of 2. (I haven't tried writing a
> hand-optimized SSE function to compare with.)

Yes, a higher improvement would be expected. The good thing is that once
having an orc function, fluidsynth will automatically be more attractive
to embedded folks as well.
>
>  - In addition iir_filter_apply function is difficult to SIMD optimize
> since every sample depends on the previous sample, via the
> dsp_centernode variable.
>
>  - The interpolate_4th_order function (which is the standard order) is
> difficult to SIMD optimize due to loop conditions (where you sometimes
> have to interpolate over sample points in both loop start and loop end
> for the same destination sample).
>
>  - Do we really need more performance? Today's computers can handle
> thousands of voices in real-time, and if you have an old computer you
> might not have SSE anyway...
>
>  - Even though SIMD doesn't seem worth the effort at this point, I'd
> still like to revisit the float vs doubles question. On my amd64, floats
> seem slightly faster than doubles. So my question is: when or what do we
> gain from the increased precision? So far, the only thing I've heard is
> this:
> http://lists.nongnu.org/archive/html/fluid-dev/2010-09/msg00053.html
> Victor, can you follow up, perhaps redo the listening test with latest
> trunk with the float bug fixed and see if there still are quality
> differences?

+1 for float.

Stefan
>
> // David
>
>
> _______________________________________________
> fluid-dev mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/fluid-dev
>
>
> _______________________________________________
> fluid-dev mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/fluid-dev




reply via email to

[Prev in Thread] Current Thread [Next in Thread]