octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bin2dec behavior different from Matlab?


From: Daniel J Sebald
Subject: Re: bin2dec behavior different from Matlab?
Date: Fri, 16 Mar 2012 22:15:26 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 03/16/2012 07:38 PM, Rik wrote:
On 03/16/2012 04:53 PM, Daniel J Sebald wrote:

One can do this.  In general, cellstr are slower than using indexing on
character arrays.  I tried the following and it works

s = char (strrep (cellstr (s), " ", ""));
s = strjust (s, "right");


Why is strjust necessary here?
If the white space is removed from the string the string will already be
justified as a consequence.  Remove the strjust() command and benchmark
again.
The algorithm depends on the character matrix being right justified.  The
char function produces a left-justified matrix.  Try 'char ("1", "111")' as
an example.

OK, we aren't thinking along the same line. What I'm wondering is if there is some method of doing the bin2dec group of functions without the character matrix approach. With the advent of the cell array, the group of routines that worked with character strings in a matrix configuration sort of fell out of favor. So now people programming scripts might think in terms of a cell array of character strings of binary numbers. That data might come from a file or whatever; it's just that it is more convenient to work with strings contained as cell array.


Also, strrep may not be so efficient because it is general.  It works
with two strings.  This process is only interested in the one character '
', so the isspace or != test might prove much faster.
You can use indexing for deletion within ordinary arrays but not for cell
strings.  Try ' cstr = {"1 0 1"; "1"}; cstr(isspace(cstr)) = "" ' and it
will simply error out.  regexprep() would work but it is slower that strrep.

Also, there may be a technique of using cellfun instead of converting
back to char that can save time.
I've benchmarked cellfun many times and it is slower than straight indexing.

There are a lot of optimization methods to explore here.
Feel free to improve the code.  It is available in Mercurial.  The
changeset is 14472:e995b1c97e13.

To create a test matrix I used

tvec = char (randi ([48 49], 1e6, 10));
tvec(randi(1e7, 1e6,1)) = " ";

which creates 1 million 10 digit binary numbers with about 10% of the
values being spaces.

The input you are choosing is a character matrix. Let's also create the equivalent cell array of character strings:

ctvec = cellstr(tvec);

I'm saying that ctvec is more likely to be the user's starting point these days, and that to convert that ctvec to a character array might not be the way to go.

Unfortunately, I don't have the latest Octave and until I can get Mercurial working on my machine I can't do benchmark comparisons.

From my rough estimate, you have a fast machine. I have a Xeon quad core running at of 3GHz and I'm not getting near the times you are with bin2dec. Perhaps there has been a lot of optimization in Octave over the past couple versions.

Well, here is what I'm doing for a simple test, and you can experiment with this little bit of code on your machine. I'm attaching a script file called test_bin2dec.m which uses cellfun() to implement bin2dec. It is bare bones and doesn't do any sanity check on the characters being between '0' and '1', but my point is to illustrate there is a different approach.

Now, if cellfun() turns out to be slightly slower, the clean code might give it an advantage. Or maybe we'll have to ask John to look at the internal cellfun() routine because the point of that routine is looping efficiency.

Running this little script

  tvec = char (randi ([48 49], 1e6, 10));
  tvec(randi(1e7, 1e6,1)) = " ";
  ctvec = cellstr(tvec);

  cpuzero = cputime();
  junk = bin2dec(tvec);
  cputime() - cpuzero

  cpuzero = cputime();
  junk = test_bin2dec(ctvec);
  cputime() - cpuzero

on my machine produces

octave:19> tvec = char (randi ([48 49], 1e6, 10));
octave:20> tvec(randi(1e7, 1e6,1)) = " ";
octave:21> ctvec = cellstr(tvec);
octave:22>
octave:22> cpuzero = cputime();
octave:23> junk = bin2dec(tvec);
octave:24> cputime() - cpuzero
ans =  423.04
octave:25>
octave:25> cpuzero = cputime();
octave:26> junk = test_bin2dec(ctvec);
octave:27> cputime() - cpuzero
ans =  68.398

So two questions come to mind from this:

1) The cellfun() based approach is five times faster than the version 3.2.4 approach (granted, I left out several things), and the conversion ctvec = cellstr(tvec) is relatively fast compared to these benchmark times so maybe converting to a cell array approach is better. (John's point I believe.) There may be a better approach than the power() routine, but I was just trying to illustrate cellfun().

2) What machine are you using that is so fast?!

Dan

Attachment: test_bin2dec.m
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]