octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 3D versus 2D Indexing and the Speed Thereof


From: Luis F. Ortiz
Subject: Re: 3D versus 2D Indexing and the Speed Thereof
Date: Wed, 29 Nov 2006 11:02:29 -0500

On Tue, 2006-11-21 at 03:16 -0500, John W. Eaton wrote:
> OTOH, if Octave's Array class knew how to do slices and if we
> recognized these operations as slices, then the time should be
> constant here, not proportional to the number of elements copied.

So, lets pretend that I added a method to the Array class that looked
something like this:

        // Maybe this should be in ArrayRep??
        template <class T>
        void
        Array<T>::strip_gather(Array<T>& source, octave_idx_type
        dest_offset, octave_idx_type source_offset,
                           octave_idx_type element_count , 
                           octave_idx_type strip_count ,
                           octave_idx_type strip_stride)
        {
                T *raw_source, *raw_dest;
                raw_source = & ( source.xelem(source_offset) );
                raw_dest = & ( xelem(dest_offset) );
                for (octave_idx_type i = 0 ; i < strip_count; ++i)
                 {
                    bcopy(raw_source,raw_dest,sizeof(T)*element_count);
                    raw_source += strip_stride;
                    raw_dest += element_count;
                 }
        }

What this method would do is gather strips consisting of a fixed number
of elements separated by a fixed stride into a contiguous strip. 
If we then combined this with some functions which could detect some
special cases, like the following:

        // Find a single non-colon equivalent index.
        // Returns -1 if there is more than one or there are none
        octave_idx_type
        nearly_colon_equiv (const Array<idx_vector>& ra_idx,
                         const dim_vector& frozen_lengths)
        {
          octave_idx_type idx_n = ra_idx.length ();
          int n = frozen_lengths.length ();
          octave_idx_type not_colon_equiv = -1;
          assert (idx_n == n);
          for (octave_idx_type i = 0; i < n; i++)
            {
              if ( ! ra_idx(i).is_colon_equiv (frozen_lengths(i)))
                {
                  if ( not_colon_equiv == -1 )
                    not_colon_equiv = i;
                  else
                    return -1;
                }
            }
          return not_colon_equiv;
        }

We could then think about handling a variety of special cases.  Here is
some pseudo-code for some of the 1D/2D/nD cases that could be done
faster with these new functions:

**1-Dimensional Special Cases

B=A(x:z)        B.strip_gather (A, 0, x, z–x+1, 1, 0);

B=A(x:y:z)      B.strip_gather (A, 0, x, 1, 1+ceil((z-x)/y,y);


**2-Dimensional Special Cases

                % S(j) is the size of array A along index j
row select:     B=A(:, i)       
                B.strip_gather (A,0,idx2elem(1,i), S(0), 1, 0); 
                                
                B=A(:,x:z)      
                B.strip_gather (A,0,idx2elem(1,x), S(0), z-x+1, S(0));

column select:  B=A(i, :) 
                B.strip_gather(A,0,idx2elem(i,1),1, S(1), S(0));
                B=reshape(B,[1 S1]);

                B=A(x:z, :) 
                B.strip_gather(A,0,idx2elem(x, 1), z-x+1, S(1), S(0));
                B=reshape(B,[z-x+1 S1]);

**N-Dimensional Special Cases

First select:   B=A(i,: … :) 
                A1=reshape(A,[A.S(0) S(1)*S(2)*...S(n-1) ); 
                B= A1(i,:);
                B=reshape(B,[ 1 A.S(1) S(2) … S(n-1)]


Last select:    B=A(: … :,i) 
                A1=reshape(A,[A.S(0)*..*S(n-2) S(n-1));
                B= A1(:,i);
                B=reshape(B,[A.S(0) S(1) … S(n-2)]


Middle select:  B=A(: …:,i,: …:) 
                % i is the dth index
                A1=reshape(A,[S(0)*..S(d-1) S(d) S(d+1)*..S(n-1) ); 
                B.strip_gather( A1, 0, idx2elem([1 … i .. 1]),A1.S0,
                                A1.S2,A1.S0* A1.S1);
                B=reshape(B,[A.S(0) A.S(1) … 1 … A.S(n-1)]


Do folks think these optimizations would be useful?  
Is the implementation the right approach?

Symmetric improvements to the subs-assignment could be made by adding
the reverse of strip_gather (its evil twin strip_scatter). The
dest_offset argument, which is zero for all the cases presented here,
would be useful if one chose to add another level of looping on top
of strip_gather to handle additional cases or more strided cases. 

I think this could result in an improvement of over a factor of 10X for
the handled cases.   It would only work, however, for the dense storage
classes and would not be useful for any of the sparse cases.


Luis F. Ortiz
Manager of Software Development
email: address@hidden
work : +1 781 419 5059
fax  : +1 781 419 6050



reply via email to

[Prev in Thread] Current Thread [Next in Thread]