octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: OSKI, an automatically tuned sparse kernel library.


From: Richard Vuduc
Subject: Re: OSKI, an automatically tuned sparse kernel library.
Date: Wed, 29 Jun 2005 10:24:07 -0500
User-agent: Mozilla Thunderbird 1.0.2 (X11/20050317)

David Bateman wrote:
It looks interesting. Octave at the moment has a pretty good sparse matrix mul code, that is significantly faster than matlab for most densities. Matlab beats octave for very low densities.

This is very interesting, and is the sort of issue OSKI addresses since tuning is both matrix- and machine-specific.

> This was written
by Andy Adler and so perhaps he can comment on whether OSKI makes sense as a replacement for his code or not. As for the triangular solve code I just copied the algorithm from the blas {d,z}trsm function and adpated it for sparse matrices.... So you might be faster there as well. Motivation to try your code > might be lacking however
due to a severe lack of time :-)

OSKI is designed to "piggyback" on existing code that already uses compressed sparse row (CSR) or column (CSC) sparse matrix formats. You provide OSKI with your CSR/CSC representation, and the library returns a handle. There is a special "sharing" mode in which you allow OSKI to keep pointers to your CSR/CSC data structure, and OSKI promises not to change it. By sharing, there are no extra copies of the matrix floating around. You'd carry the handle around with your own data structure, and selectively either use your own routines, or call OSKI's routines on the handle.

As I recall, Matlab uses CSC, and Octave uses either CSR or CSC for compatibility with SuperLU and UMFPACK, so OSKI is at least relevant.

I'm not claiming that using OSKI is a piece of cake, and I certainly wasn't expect any of you to say, "Oh yeah, I'm gonna start using it right away!" :-) I will certainly try to interest an OSKI developer into doing this as a proof-of-concept. I mostly just wanted your team to be aware of OSKI.

One question I have is the issue of tuning. The workload limits you set seems to imply that the code in OSKI is supposed to be used in a kernel

You are right that we target iterative methods, and it seems reasonable that modifying your dependent libraries (e.g., ARPACK) could be a better way to for Octave to experiment with OSKI indirectly.

triangular solve functions, its better to avoid tuning. What are the gains you might expect from OSKI in the case where there is very little tuning, and the calculation is performed once? Is it in general better than what we already have?

Since OSKI piggybacks on your code, if no tuning has happened and you use the "shared" mode, then OSKI just uses your data structure. So there should be no performance hit in this case. One of our important design goals is that if you bothered to use OSKI at all, it shouldn't hurt you.

The potential benefits depend strongly on the types of matrices you users are using. Much of the tuning in OSKI currently targets applications that use the finite-element method, where 1.5--4x speedups over CSR for mat-vec are possible on a variety of platforms. We also have some techniques for getting a similar range of improvements when the matrix really is randomly structured with a sufficient number (maybe 20 or more?) non-zeros per row. We don't yet have proven techniques to handle extremely sparse matrices (just a few non-zeros per row) with no diagonal or dense block substructure whatsoever. If you can say what "typical" Octave users deal with, that might be useful information for us.

--rich



reply via email to

[Prev in Thread] Current Thread [Next in Thread]