octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FYI: sparse indexed assignment rewritten


From: Jaroslav Hajek
Subject: FYI: sparse indexed assignment rewritten
Date: Tue, 13 Apr 2010 16:54:52 +0200

hi all,

I've nearly finished rewriting the sparse indexed assignment code.
The old incomprehensible quirky implementation is replaced by a shiny
new incomprehensible quirky implementation :)
Well at least I tried to comment more. And the code went down from
1100 to 330 lines.

Notable features (applies to indexing as well):

1. Column operations are the primary target. This makes lot of sense
because Octave uses compressed sparse column format, so operating on
whole columns generally means they don't need to be uncompressed. For
instance, 1D indexing is optimized for sparse column vectors. 2D
indexing optimizes the cases S(:,l:u), S(:,J), S(i, J), S(l:u, J),
S(p, J) where J is an arbitrary vector and i,l,u are scalars, and p is
a permutation. Similarly for indexed assignment and null assignment.

1a. I don't claim the new code is faster everywhere than the old code,
but it seems to be faster in the important cases. If you discover a
significant speed regression, please share your results.

2. nzmax == nnz is no longer true. This is useful when repacking
matrices after an operation that might have introduced some (but few)
zeros. The matrix will be recompressed in place, without the need for
copying.

3. spalloc is no longer a dummy. It is possible to pre-allocate a
matrix and then build it sequentially. See the docstring for spalloc
for more details.
The following benchmark illustrates this:

r = 50000;
c = 5000;
ratio = 0.001;
nz = r * c * ratio;
s = spalloc (r, c, nz);
## build matrix by columns
tic;
for j = 1:c
  s(:, j) = sprand (r, 1, ratio);
endfor
toc
## build matrix by blocks of columns
s = spalloc (r, c, nz);
tic;
for j = 1:10:c
  s(:, j:min(c,j+9)) = sprand (r, 10, ratio);
endfor
toc

with a recent tip, Core 2 Duo @ 2.83 GHz, g++ -O3 -march=native, I get:

address@hidden:~/devel/octave/main> octave -q ttspa.m
Elapsed time is 25.774 seconds.
Elapsed time is 17.2972 seconds.

with the new changes, I get:

address@hidden:~/devel/octave/main> ./run-octave -q ttspa.m
Elapsed time is 1.32196 seconds.
Elapsed time is 0.189747 seconds.

and with increasing number of columns the factors increase.

4. Most error messages are no longer embedded in code but shared with
the Array code. The same error IDs are used for common messages.

I intend to put together a more comprehensive benchmark, if there's enough time.

Open problems:

1. over-allocated matrices are not yet economized when they are
stored. It should be possible to reuse the existing storable_value
hook for this (I knew it will be useful), but it needs some more
thinking.
2. concatenation doesn't use the new assignment & preallocation
mechanism, leading to some ridiculous results:

r = 50000;
c = 5000;
ratio = 0.001;
## create a collection of sparse columns
scol = arrayfun (@sprand, r, ones (1, c), ratio, "UniformOutput", false);

disp ("approach 1: built-in concatenation");
tic;
s = [scol{:}];
toc

disp ("approach 2: hand-coded loop");
tic;
nz = sum (cellfun ("nnz", scol));
s = spalloc (r, c, nz);
for i = 1:c
  s(:,i) = scol{i};
endfor
toc

produces this:

approach 1: built-in concatenation
Elapsed time is 13.6298 seconds.
approach 2: hand-coded loop
Elapsed time is 0.0499048 seconds.

i.e. a built-in operation is 270-times slower than an equivalent
interpreted loop (and you thought loops were sluggish? :).
It would be cool if sparse concatenation could take advantage of the
new speed-ups, but it needs some work. Help is much appreciated.

3. simple grep query shows that the interpreter uses nzmax at
suspiciously many places. I bet some of those usages are wrong and
should be nnz, but it requires individual approach. So get ready for
mysterious crashes, just in case.

4. if you build the tip, you'll see 3 new test failures. These are
tests for cases like the following:

a = sparse (2, 0);
a(5) = 1;

previously Octave produced a 5x1 matrix. However, Matlab 2007a
produces a 2x3 matrix (i.e. adds rows as needed). In absence of more
information, I implemented the 2007a approach, causing the tests to
fail. I suppose the tests were done against a different version, but I
don't know whether it was newer.
If anyone can clarify this with a more recent version of Matlab, please do.

4. bugs. I'm sure there's a lot of them.

Comments? Suggestions?

enjoy!

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

PS. I feel exhausted today. I'm wondering what heros wrote all that
GNU and other free software I use on daily basis, because although it
is fun it to improve Octave it also costs me lots of time and energy.
I might need to play less with Octave and focus more on publications
to earn my bread. The country is short of money, and those who dabble
in science are going to feel it next year. If only each contribution
was a publication, and each use was a citation :)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]