octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 3D versus 2D Indexing and the Speed Thereof


From: Bob Weigel
Subject: Re: 3D versus 2D Indexing and the Speed Thereof
Date: Thu, 12 Apr 2007 17:10:14 -0400
User-agent: KMail/1.8.2

>
>     // First do one element to force the copy-on-write
>     elem(dest_offset) = source.elem (source_offset);
>     raw_source = &(source.rep->data[source_offset]);
>     raw_dest = &(rep->data[dest_offset] );
>
>     for (octave_idx_type i = 0; i < block_count; i++)
>      {
>        memcpy (raw_dest, raw_source, sizeof(T)*element_count);
>        raw_source += source_stride;
>        raw_dest += dest_stride;
>      }
>   }
>

I am not sure if this is related, but I have wanted to speed up Octave's 
repmat (and zeros and ones) function.   As currently implemented,  repmat.m 
uses some impressive vectorization tricks, but it is orders of magnitude 
slower than other implementations.  The way I was going to approach it was 
using the method from repmat.c at 
http://research.microsoft.com/~minka/software/lightspeed.

I don't fully understand the C++ code that was posted iin this thread, but I 
suspect that it is using the same method of the "lightspeed" repmat.c.  If it 
is, then ignore this post.

The relevant part of repmat.c is as follows.  

* repeat a block of memory rep times */
void memrep(char *dest, size_t chunk, int rep)
{
#if 0
  /* slow way */
  int i;
  char *p = dest;
  for(i=1;i<rep;i++) {
    p += chunk;
    memcpy(p, dest, chunk);
  }
#else
  /* fast way */
  if(rep == 1) return;
  memcpy(dest + chunk, dest, chunk);
  if(rep & 1) {
    dest += chunk;
    memcpy(dest + chunk, dest, chunk);
  }
  /* now repeat using a block twice as big */
  memrep(dest, chunk<<1, rep>>1);
#endif
}


reply via email to

[Prev in Thread] Current Thread [Next in Thread]