discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimizati


From: Abhishek Bhowmick
Subject: Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimization with VOLK
Date: Mon, 10 Mar 2014 20:03:05 +0530

Hello,
I would like to clarify some things :

1. I feel it is tough to beat spiral implementations through manual
vectorization, performance wise. If so, is readability the prime and
only reason for using intrinsics manually, and hence of value to the
community ?

2. What is currently the state of adding support for sse4, neon in
stock volk kernels (project ideas page mentions some work is under
way) ? Would be great if someone who is working on this already shares
his branch, so that I may know how much/if any work is needed in this
before moving on to avx. Of course, new kernels will need support for
all.

3. How feasible/useful does it sound to incorporate the newly added
idea of 'turbo equalizer' within the ofdm system ? Are the
requirements of the proposed equalizer overkill for the ofdm blocks?

Abhishek

On Wed, Feb 26, 2014 at 1:49 AM, Abhishek Bhowmick
<address@hidden> wrote:
> Thanks everyone. These are quite a few pointers, I will spend some time
> digesting it all.
>
> So there are really two approaches, large complex kernels on
> one hand and AVX2/AVX/FMA on the other, or a combination of the two.
>
> I guess I should propose identifying and implementing larger complex kernels
> and then further accelerating using AVX2/FMA etc. Doing both will of
> course limit the
> number of  applications/algorithms I can feasibly target. What's your take on
> this ?
>
> Abhishek
>
> On Wed, Feb 26, 2014 at 5:03 AM, West, Nathan
> <address@hidden> wrote:
>> On Tue, Feb 25, 2014 at 4:37 PM, West, Nathan
>> <address@hidden> wrote:
>>>>  > On Sun, 2/23/14, Abhishek Bhowmick <address@hidden>
>>>>  wrote:
>>>>  >
>>>>  >  Subject: [Discuss-gnuradio] Google Summer of Code
>>>>  2014 applicant : Optimization with VOLK
>>>>  >  To: address@hidden
>>>>  >  Date: Sunday, February 23, 2014, 8:52 AM
>>>>  >
>>>>  >  Hello,
>>>>  >  I have completed a Bachelor's degree in
>>>>  >  Electrical Engineering from IIT Bombay, India and
>>>>  will be
>>>>  >  joining a masters program in Computer Science in
>>>>  August. For
>>>>  >  the summer, I am interested in participating GSoC
>>>>  2014 and
>>>>  >  GNU Radio is an organization wheAbhishekre my background
>>>>  fits
>>>>  >  nicely.
>>>>  >
>>>>  >>  > --------------------------------------------
>>>
>>>>  >  I went through the ideas page and was
>>>>  >  particularly interested in doing performance
>>>>  optimization
>>>>  >  with VOLK. After going through some online
>>>>  documentation
>>>>  >  about the library and the SDR'12 paper, I
>>>>  realised that
>>>>  >  following areas need work :
>>>>  >
>>>>  >  1. Profiling GNU radio code to identify new
>>>>  >  kernels and implement them for existing Intel
>>>>  SIMD
>>>>  >  extensions, also porting kernels to other ISA
>>>>  extensions.
>>>>  >  2. Better testing of the effects of more complex
>>>>  >  scheduler logic on larger environments (beyond
>>>>  simple
>>>>  >  kernels)
>>>>  >
>>>>  >  3. Exploring extension of Volk to GPU ISAs, to
>>>>  >  leverage chips such as AMD Fusion (However, this
>>>>  seems to
>>>>  >  more research than software development)
>>>>  >
>>>>  >  According to the GSoC proposal, point (1) seems
>>>>  >  to be the expectation. Given this, I would like
>>>>  some advice
>>>>  >  on how to go ahead looking for potential ideas
>>>>  (and some
>>>>  >  feedback on feasibility of the other ideas as
>>>>  well)
>>>>  >
>>>>  >
>>>>  >  My background : C++, Python, Signal Processing,
>>>>  >  Computer Architecture
>>>>  >
>>>>  >  Thanks,
>>>>  >  Abhishek Bhowmick
>>>>  >
>>>
>>>
>>> This is a great conversation, and I'll take the opportunity to plug
>>> the up coming VOLK working group call
>>> (https://plus.google.com/u/1/events/ch3jrjcvp7mdiqelpismfieg3n0).
>>> Bogdan, your results aren't particula>  >
>>> --------------------------------------------
>>> rly surprising, but the feedback is really good to hear.
>>>
>>> Back to GSoC:
>>>
>>> Abhishek,
>>>
>>>>Thanks for the pointers to gr-atsc and gr-80211. I have started
>>>>looking there as a
>>>>starting point. Are there similar modules which are undergoing volk
>>>>speedup fixes?
>>>>I am also trying to meet up with other people who have been using GNU radio
>>>>to identify potential modules for acceleration. As you are now a
>>>>mentor organization, I feel it's a good time for us to get into
>>>>detailed discussions.
>>>
>>> From the previous discussion it should be apparent that how algorithms
>>> are implemented will make the biggest difference, and that the new
>>> acceleration is primarily going to come from larger more complex
>>> kernels. At the end of the day it's going to be your proposal. So far
>>> on the list of places to look we have
>>>
>>> * in-tree OFDM (contact Martin)
>>> * gr-atsc (use Andrew Davis' fork)
>>> * gr-dvbt
>>> * gr-fecapi
>>>
>>> For your proposal I would recommend looking at their code, then
>>> getting in contact with the author(s) of those modules to ask about
>>> their thoughts on accelerating blocks they have written. The reality
>>> of this project is that we are accelerating some signal processing
>>> algorithm and knowledge of that algorithm is useful for acceleration.
>>> Whatever application you have interested and/or knowledge in (fresh
>>> out of a BS it's more likely to be interest) should guide your
>>> proposal. If you know anything about error correcting codes then the
>>> latter 2 would be good fits. OFDM frame detection probably has a
>>> gentler learning curve since at the basic level you're looking at
>>> convolution, and there's papers you can look for on more involved
>>> algorithms. Other algorithms to look at might include agc or
>>> equalizers.
>>>
>>> If you're interested in GPU programming don't forget to checkout gr-gpu.
>>>
>>>>
>>>>>
>>>>> At the moment the only mainstream ISA not being targeted is probably
>>>>> AVX2, which has
>>>>> some nice features for the type of kernels we're doing.  If you went
>>>>> that route it would likely need add
>>>>> protokernels to a pretty large number of kernels.
>>>>>
>>>>> Nathan
>>>>
>>>>This also seems to be promising, though I guess it would require me to
>>>>come up to speed with AVX2 (which I would love to do). Could you
>>>>please elaborate
>>>>a little on the kind of beneficial features you have in mind ? I am
>>>>concerned that the
>>>>job of adding proto-kernels might turn out to be mundane/tedious ? Is
>>>>that a valid concern ?
>>>
>>> Right, so as Martin mentioned the answer is sort of relative. I
>>> wouldn't go so far as to say it's mundane, especially if you have
>>> little 
>>> experienhttp://gnss-sdr.org/documentation/google-summer-code-2014-ideas-listce
>>>  with using intrinsics and SIMD instructions. One
>>> reason AVX isn't so prominently featured (I suspect) is that the
>>> instructions are almost the same as SSE instructions, but the vectors
>>> are twice as long so that is actually mundane. AVX2/FMA extensions
>>> introduce some new features to the amd64 instruction set. The most
>>> obvious being that it looks like Intel and AMD finally settled in on
>>> the same fused multiply-add (there's also a multiply-subtract that's
>>> good for complex numbers) implementation. That will likely be able to
>>> speed things up a bit, but I'm also looking forward to seeing gains
>>> from the various load_gathers that have been introduced. They allow
>>> you to do a single load operation that gathers vector elements that
>>> span pretty large ranges. VOLK won't be so interested in the large
>>> ranges (except maybe decimators), but it could be useful for loading
>>> complex vectors. There's some other math functions we may be able to
>>> leverage, but those are two features that I think would be widely
>>> applicable.
>>>
>>> In your proposal you should definitely include what ISAs you intend to
>>> use, and if there are features specific to that instruction set then
>>> point out why it's a good choice. This is mostly important for
>>> choosing between SSE and friends, AVX, AVX2/FMA. It would be good to
>>> see plans that include NEON support for anything you'd add to amd64
>>> platforms, but that's not a requirement.
>>>
>>>
>>> Nathan
>>
>> I also see that GNSS-SDR made it to GSoC and they have a VOLK related 
>> project.
>> http://gnss-sdr.org/documentation/google-summer-code-2014-ideas-list
>
> Yeah, I also noticed that. I might submit a proposal to them also.
>
> Abhishek



-- 
Regards;
Abhishek Bhowmick,
Senior Undergraduate,
Department of Electrical Engineering,
IIT Bombay.

On Wed, Feb 26, 2014 at 12:19 PM, Abhishek Bhowmick
<address@hidden> wrote:
> Thanks everyone. These are quite a few pointers, I will spend some time
> digesting it all.
>
> So there are really two approaches, large complex kernels on
> one hand and AVX2/AVX/FMA on the other, or a combination of the two.
>
> I guess I should propose identifying and implementing larger complex kernels
> and then further accelerating using AVX2/FMA etc. Doing both will of
> course limit the
> number of  applications/algorithms I can feasibly target. What's your take on
> this ?
>
> Abhishek
>
> On Wed, Feb 26, 2014 at 5:03 AM, West, Nathan
> <address@hidden> wrote:
>> On Tue, Feb 25, 2014 at 4:37 PM, West, Nathan
>> <address@hidden> wrote:
>>>>  > On Sun, 2/23/14, Abhishek Bhowmick <address@hidden>
>>>>  wrote:
>>>>  >
>>>>  >  Subject: [Discuss-gnuradio] Google Summer of Code
>>>>  2014 applicant : Optimization with VOLK
>>>>  >  To: address@hidden
>>>>  >  Date: Sunday, February 23, 2014, 8:52 AM
>>>>  >
>>>>  >  Hello,
>>>>  >  I have completed a Bachelor's degree in
>>>>  >  Electrical Engineering from IIT Bombay, India and
>>>>  will be
>>>>  >  joining a masters program in Computer Science in
>>>>  August. For
>>>>  >  the summer, I am interested in participating GSoC
>>>>  2014 and
>>>>  >  GNU Radio is an organization wheAbhishekre my background
>>>>  fits
>>>>  >  nicely.
>>>>  >
>>>>  >>  > --------------------------------------------
>>>
>>>>  >  I went through the ideas page and was
>>>>  >  particularly interested in doing performance
>>>>  optimization
>>>>  >  with VOLK. After going through some online
>>>>  documentation
>>>>  >  about the library and the SDR'12 paper, I
>>>>  realised that
>>>>  >  following areas need work :
>>>>  >
>>>>  >  1. Profiling GNU radio code to identify new
>>>>  >  kernels and implement them for existing Intel
>>>>  SIMD
>>>>  >  extensions, also porting kernels to other ISA
>>>>  extensions.
>>>>  >  2. Better testing of the effects of more complex
>>>>  >  scheduler logic on larger environments (beyond
>>>>  simple
>>>>  >  kernels)
>>>>  >
>>>>  >  3. Exploring extension of Volk to GPU ISAs, to
>>>>  >  leverage chips such as AMD Fusion (However, this
>>>>  seems to
>>>>  >  more research than software development)
>>>>  >
>>>>  >  According to the GSoC proposal, point (1) seems
>>>>  >  to be the expectation. Given this, I would like
>>>>  some advice
>>>>  >  on how to go ahead looking for potential ideas
>>>>  (and some
>>>>  >  feedback on feasibility of the other ideas as
>>>>  well)
>>>>  >
>>>>  >
>>>>  >  My background : C++, Python, Signal Processing,
>>>>  >  Computer Architecture
>>>>  >
>>>>  >  Thanks,
>>>>  >  Abhishek Bhowmick
>>>>  >
>>>
>>>
>>> This is a great conversation, and I'll take the opportunity to plug
>>> the up coming VOLK working group call
>>> (https://plus.google.com/u/1/events/ch3jrjcvp7mdiqelpismfieg3n0).
>>> Bogdan, your results aren't particula>  >
>>> --------------------------------------------
>>> rly surprising, but the feedback is really good to hear.
>>>
>>> Back to GSoC:
>>>
>>> Abhishek,
>>>
>>>>Thanks for the pointers to gr-atsc and gr-80211. I have started
>>>>looking there as a
>>>>starting point. Are there similar modules which are undergoing volk
>>>>speedup fixes?
>>>>I am also trying to meet up with other people who have been using GNU radio
>>>>to identify potential modules for acceleration. As you are now a
>>>>mentor organization, I feel it's a good time for us to get into
>>>>detailed discussions.
>>>
>>> From the previous discussion it should be apparent that how algorithms
>>> are implemented will make the biggest difference, and that the new
>>> acceleration is primarily going to come from larger more complex
>>> kernels. At the end of the day it's going to be your proposal. So far
>>> on the list of places to look we have
>>>
>>> * in-tree OFDM (contact Martin)
>>> * gr-atsc (use Andrew Davis' fork)
>>> * gr-dvbt
>>> * gr-fecapi
>>>
>>> For your proposal I would recommend looking at their code, then
>>> getting in contact with the author(s) of those modules to ask about
>>> their thoughts on accelerating blocks they have written. The reality
>>> of this project is that we are accelerating some signal processing
>>> algorithm and knowledge of that algorithm is useful for acceleration.
>>> Whatever application you have interested and/or knowledge in (fresh
>>> out of a BS it's more likely to be interest) should guide your
>>> proposal. If you know anything about error correcting codes then the
>>> latter 2 would be good fits. OFDM frame detection probably has a
>>> gentler learning curve since at the basic level you're looking at
>>> convolution, and there's papers you can look for on more involved
>>> algorithms. Other algorithms to look at might include agc or
>>> equalizers.
>>>
>>> If you're interested in GPU programming don't forget to checkout gr-gpu.
>>>
>>>>
>>>>>
>>>>> At the moment the only mainstream ISA not being targeted is probably
>>>>> AVX2, which has
>>>>> some nice features for the type of kernels we're doing.  If you went
>>>>> that route it would likely need add
>>>>> protokernels to a pretty large number of kernels.
>>>>>
>>>>> Nathan
>>>>
>>>>This also seems to be promising, though I guess it would require me to
>>>>come up to speed with AVX2 (which I would love to do). Could you
>>>>please elaborate
>>>>a little on the kind of beneficial features you have in mind ? I am
>>>>concerned that the
>>>>job of adding proto-kernels might turn out to be mundane/tedious ? Is
>>>>that a valid concern ?
>>>
>>> Right, so as Martin mentioned the answer is sort of relative. I
>>> wouldn't go so far as to say it's mundane, especially if you have
>>> little 
>>> experienhttp://gnss-sdr.org/documentation/google-summer-code-2014-ideas-listce
>>>  with using intrinsics and SIMD instructions. One
>>> reason AVX isn't so prominently featured (I suspect) is that the
>>> instructions are almost the same as SSE instructions, but the vectors
>>> are twice as long so that is actually mundane. AVX2/FMA extensions
>>> introduce some new features to the amd64 instruction set. The most
>>> obvious being that it looks like Intel and AMD finally settled in on
>>> the same fused multiply-add (there's also a multiply-subtract that's
>>> good for complex numbers) implementation. That will likely be able to
>>> speed things up a bit, but I'm also looking forward to seeing gains
>>> from the various load_gathers that have been introduced. They allow
>>> you to do a single load operation that gathers vector elements that
>>> span pretty large ranges. VOLK won't be so interested in the large
>>> ranges (except maybe decimators), but it could be useful for loading
>>> complex vectors. There's some other math functions we may be able to
>>> leverage, but those are two features that I think would be widely
>>> applicable.
>>>
>>> In your proposal you should definitely include what ISAs you intend to
>>> use, and if there are features specific to that instruction set then
>>> point out why it's a good choice. This is mostly important for
>>> choosing between SSE and friends, AVX, AVX2/FMA. It would be good to
>>> see plans that include NEON support for anything you'd add to amd64
>>> platforms, but that's not a requirement.
>>>
>>>
>>> Nathan
>>
>> I also see that GNSS-SDR made it to GSoC and they have a VOLK related 
>> project.
>> http://gnss-sdr.org/documentation/google-summer-code-2014-ideas-list
>
> Yeah, I also noticed that. I might submit a proposal to them also.
>
> Abhishek



-- 
Regards;
Abhishek Bhowmick,
Senior Undergraduate,
Department of Electrical Engineering,
IIT Bombay.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]