octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Improved triu.m and tril.m


From: Jaroslav Hajek
Subject: Re: Improved triu.m and tril.m
Date: Thu, 22 Oct 2009 21:54:18 +0200

2009/10/22 Marco Caliari <address@hidden>:
> Dear maintainers,
>
> although triu and tril are built-in functions in Matlab and then much
> faster, I discovered that it is possible to improve the m files in Octave,
> by eliminating the min/max evaluations inside the loops and selecting
> whether to replace numbers with zeros or zeros wih numbers.
> The results, for my Octave 3.2.3, are:
>
>> A=rand(100);
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = -31
> Elapsed time is 0.0221 seconds.
> Elapsed time is 0.00397 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = -52
> Elapsed time is 0.0224 seconds.
> Elapsed time is 0.00114 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i =  22
> Elapsed time is 0.0174 seconds.
> Elapsed time is 0.00495 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i =  63
> Elapsed time is 0.00873 seconds.
> Elapsed time is 0.00261 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i =  36
> Elapsed time is 0.0146 seconds.
> Elapsed time is 0.00413 seconds.
> ans = 0
>
> For me the speed-up is > 3. Enclosed, the patches wrt 3.2.3.
>
> Best regards,
>
> Marco

Interesting. For comparison, here is my benchmark:

n = 500;
A = rand (n);
k = 1;
tril (eye (2)); # to avoid startup penalty
for iter = 1:4
  tic; tril (A, k); toc
endfor

my tip gives:

Elapsed time is 0.0236731 seconds.
Elapsed time is 0.0239391 seconds.
Elapsed time is 0.0221 seconds.
Elapsed time is 0.022387 seconds.

the relevant code snippet is

  retval = resize (resize (x, 0), nr, nc);
  for j = 1 : min (nc, nr+k)
    nr_limit = max (1, j-k);
    retval (nr_limit:nr, j) = x (nr_limit:nr, j);
  endfor

one can save a lot by doing a vectorized max once and eliminating the
repeated range:

  retval = zeros (nr, nc, class (x));
  jj = 1:min (nc, nr+k);
  ii = max (1, jj - k);
  for j = jj
    i = ii(j):nr;
    retval (i, j) = x (i, j);
  endfor

this gives:

Elapsed time is 0.0148931 seconds.
Elapsed time is 0.0148301 seconds.
Elapsed time is 0.0152121 seconds.
Elapsed time is 0.014981 seconds.

using cellslices + vertcat + reshape is faster still:

  m = min (nc, nr+k);
  ii = max (1, (1:m) - k);

  sl(2,:) = cellslices (x(:), ii, nr*ones (1, m));
  sl(1,:) = cellslices (zeros (nr, 1), ones (1, m), ii - 1);

  retval = reshape (vertcat (sl{:}), nr, nc);

gives

Elapsed time is 0.00466204 seconds.
Elapsed time is 0.00521612 seconds.
Elapsed time is 0.00525379 seconds.
Elapsed time is 0.00514317 seconds.

for comparison, your patch:

Elapsed time is 0.00818515 seconds.
Elapsed time is 0.00864601 seconds.
Elapsed time is 0.00886297 seconds.
Elapsed time is 0.00826406 seconds.

(quite good, compared to the loop-free code). Winner is number 3.
triu can be handled similarly. Neither of these approaches works
efficiently for sparse functions, though, so it should get a special
code (probably using find). But anyway, I think implementing
triu/tril/vech as compiled functions should be trivial and likely the
fastest.

Out of interest, do you depend on triu/tril efficiency in an application?

regards

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]