octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wrapper for vectorized math libraries


From: Dan Davis
Subject: Re: Wrapper for vectorized math libraries
Date: Sat, 1 Oct 2011 12:26:56 -0400

On Fri, Sep 30, 2011 at 6:17 PM, Christopher Hulbert
<address@hidden> wrote:
> On Fri, Sep 30, 2011 at 4:07 PM, Dan Davis <address@hidden> wrote:
>> On Fri, Sep 30, 2011 at 2:05 PM, Christopher Hulbert
>> <address@hidden> wrote:
>> \>
>>> Have you looked at the vector, signal, and image processing library
>>> (VSIPL) [1]? There is also VSIPL++ which is a specification (serial
>>> and parallel specifications) [1]. This was created initially under
>>> DARPA funding to define a portable high-performance API for a number
>>> of computational functions. I have at times used ACML, MKL, vanilla
>>> BLAS and LAPACK, FFTW, etc. as backends for that specification. You
>>> will also find presentations from Miriam Leeser from Northwestern I
>>> think who did work in using FPGA and GPU backends in VSIPL. GTRI also
>>> has a GPU release of VSIPL. The reference implementation provides the
>>> unoptomized versions you were looking at implementing when not
>>> compiled with a backend. Unless you want to design your own, you may
>>> find it nice starting from something established, and I do not some
>>> commercial companies using VSIPL++ in production environments.
>>
>> Thanks for the info.  I may have seen VSIPL before, but it wasn't on
>> my mind when I started working on this.  As nice as it is in terms of
>> a middleware, it misses the point.  What I am working towards is lower
>> level than that with far fewer functions.  The even bigger point is
>> that I would like run-time selection (not compile time) of the best
>> available library.  That way something like Octave, which is packaged
>> via .deb, .rpm, etc. type packages, could link in to the vectorized
>> math functionality that is available on the machine.  So if someone
>> has ACML or IMKL, software like Octave could take advantage of that
>> without recompiling.
>>
>> For things like blas, fftw, etc., which have standardized function
>> calls, I can force that run-time selection if I want using LD_PRELOAD
>> and listing whichever libraries I want to use.  However, for the
>> common math functions, no standard calls exists, and, thus as far as
>> I'm aware, no run-time selection ability exists.
>
> I don't understand how it misses the bigger picture, but you did look
> at these and decide it wouldn't fit so I will not question you there.
> I only mentioned in case you hadn't heard of it. Also, I am not sure
> what "lower level than that" means.
>
> In the VSIPL implementation I modified, setting the environment
> variable VSIP_FFT_D would tell the library at run time which one I
> wanted if available, otherwise a default order of libraries available
> at compile time was used. Or compile using generic BLAS/LAPACK
> interface and use LD_PRELOAD which would also work.
>
> For the common math functions, you could take the opportunity at an
> initialization point to call a function indicating if a particular
> library is available and check its return value for validity. For
> example, VSIPL defines a function vsip_init. Then you just need a stub
> MKL, ACML, etc compiled in if the real library is not available.
> However, I have had significant problems building with one version of
> (say MKL) and using a different, so run time forcing with LD_PRELOAD
> should be cautioned.
>
> Just a few thoughts!
>
> Chris
>
>>
>> Regards,
>>
>> Dan Davis
>>
>

Chris,

Thanks for the info on VSIPL.  I missed that VSIPL provides options
for run time backend selection; it is a large specification.  Also,
your caution on run time forcing with LD_PRELOAD is something with
which I'm aware, and why I'm trying to implement this in a better way.

By stating that VSIPL is too high-level, what I mean is that it deals
with computing in numerous different "views," and, again, unless I
missed something, it requires a large number of backend libraries.  In
the project I'm developing this for, aside from some level 1 BLAS
calls, I just need the ability to compute things like the log or the
complex exponential of a few hundred thousand numbers at a time.  I
also work on numerous machines, some of which use MKL, some have AMD's
library, and some have nothing.

My first cut at solving my problem was just creating a set of inline
wrapper functions, using the pre-processor to select the correct
library call in those wrappers, and compiling on each machine.  That
works fine, and I actually don't need anything else.  However, I had
an idea for a way to wrap these array formulations of the basic
transcendental functions through to a library selected at run time,
without LD_PRELOAD or any other questionable method of loading them.
Version checking can be done at startup, and the vectorized forms
could be turned off if desired (for instance since AMD's only
guarantee accuracy to 2ulp).  Finally, by treating them as plugins, it
allows utilizing the closed source libraries without needing them
during compilation.

I also often use Octave, and this is one place where Octave still has
poor performance, largely due to a lack of open-source vectorized
transcendental functions.  Yes, I can recompile octave with
-mveclibabi=myfavoritelibrary and get some speedup, but then i can't
turn it off if I want the accuracy.  I also can't send the octave
binary out to anyone else, because it's now built to require a closed
source library for which I can't give someone the source.

Thus, I pose the question, before I spend too much time hacking for
the sake of hacking, if I could give Octave the capability to utilize
these closed source libraries' vectorized transcendental function
calls without having to distribute them with Octave, and thus, as far
as I can tell, not violating any license, would that capability be
desired?  To be clear, I'm referring only to the calls which act on an
array, not those that act on one or two numbers.  I have no plans
right now to deal with the blas calls, fft calls, etc.

Certainly, an alternative would be to convert Octave over to VSIPL++.
VSIPL++ lies more in line with the way Octave is often utilized for
processing, with a large variety of views.  But that is a much larger
project.

Regard,
Dan Davis


reply via email to

[Prev in Thread] Current Thread [Next in Thread]