octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bin2dec behavior different from Matlab?


From: Rik
Subject: Re: bin2dec behavior different from Matlab?
Date: Fri, 16 Mar 2012 16:14:28 -0700

On 03/16/2012 11:05 AM, John W. Eaton wrote:
> On 16-Mar-2012, Rik wrote:
>
> | On 03/16/2012 04:53 AM, Jordi Gutiérrez Hermoso wrote:
> | > On 16 March 2012 01:30, Daniel J Sebald <address@hidden> wrote:
> | >> This change happened with changeset 11172:7e8ce65f73cf shown here
> | >>
> | >> http://hg.savannah.gnu.org/hgweb/octave/rev/7e8ce65f73cf
> | >>
> | >> which overhauled the string number conversion routines. (Maybe the intent
> | >> was to move white space removal inside base2dec.)
> | >>
> | >> In my opinion tossing away white space makes better sense than treating
> | >> white space as zeros.
> | >>
> | >> I also suggest adding a test to the file. There currently are several, but
> | >> none that test the behavior of white space. E.g.,
> | >>
> | >> %!assert (bin2dec ("1 0 1"), 5)
> | > This seems like a clear unintentional regression. Can you prepare a changeset?
> | 3/16/12
> |
> | Daniel,
> |
> | The change is unintentional, however the fix is not absolutely
> | straightforward. Matlab's version of the function is quite limited in that
> | it works on just a single string. Octave's version also accepts character
> | matrices (one string per row) and cell array of strings. In base2dec the
> | space character is treated as 0 so that the following will work
> |
> | bin2dec ([" 101";
> | "1101"])
> |
> | For the first string in the char matrix , the leftmost value is 2^3
> | (determined by position) * 0 (value of space) so that any spaces used for
> | padding out a character matrix do not influence the resulting conversion.
> |
> | The code for 3.2.4 handled "1 0 1" by using a for loop over every row of
> | the character matrix to remove spaces BEFORE calling base2dec (str,2).
> | Looping in Octave is abysmally slow and not the way to do this. Even if we
> | have to go back to some sort of looping structure it would be better to use
> | regexprep, arrayfun, or cellfun.
> |
> | One option might be to write a custom string function that both right
> | justifies and removes excess spaces. This would replace this part of
> | base2dec.m starting at line 84.
> |
> | ## Right justify the values before anything else.
> | s = strjust (s, "right");
> |
> | This hypothetical function, strjustsqueeze [justify and squeeze] can be
> | based on strjust.m which is efficient.
> |
> | Perhaps you can investigate and propose some possible solutions and we can
> | take it from there.
>
> What about taking character matrices and converting them to cell
> arrays, then using cellfun to apply strrep to the elements of the cell
> array, then processing as we do now?


One can do this.  In general, cellstr are slower than using indexing on character arrays.  I tried the following and it works

s = char (strrep (cellstr (s), " ", ""));
s = strjust (s, "right");

I also built a hairy-looking index-based solution which right justifies and squeezes spaces.  The code for that is

## Right justify the values and squeeze out any spaces.
## This looks complicated, but indexing solution is very fast
## compared to alternatives which use cellstr or cellfun or looping.
[nr, nc] = size (s);
s = s.';
nonbl = s != " ";
num_nonbl = sum (nonbl);
nc = max (num_nonbl);
num_blank = nc - num_nonbl;
R = repmat ([1 2; 0 0], 1, nr);
R(2, 1:2:2*nr) = num_blank;
R(2, 2:2:2*nr) = num_nonbl;
idx = repelems ([false, true], R);
idx = reshape (idx, nc, nr);

## Create a blank matrix and position the nonblank characters.
s2 = repmat (" ", nc, nr);
s2(idx) = s(nonbl);
s = s2.';

The benchmarking is pretty clear.  For 1 million 10 digit binary numbers, with random spaces in them, the cellstr approach takes 8.25 seconds and the indexing solution takes 1.008 seconds.  Even for the case of the single string "1 0 1 " the indexing solution is 15% faster.

Since this is a regression, I plan to commit this code onto stable and add a test for this behavior.

Cheers,
Rik


reply via email to

[Prev in Thread] Current Thread [Next in Thread]