octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #54619] randi() is biased


From: Rik
Subject: [Octave-bug-tracker] [bug #54619] randi() is biased
Date: Thu, 6 Sep 2018 14:24:07 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0

Follow-up Comment #3, bug #54619 (project octave):

the rand from Octave is implemented in liboctave/numeric/randmtzig.cc.  As far
as I can tell, there is no 64-bit integer that is cast to double.  Similarly,
there is no "division + floor" in randi.  There is multiplication, and floor
is used to remove the fractional part.  Because I can't verify some of the
first assertions, I can't go on to accept the conclusions that follow.  The
Stack Overflow response has this comment


The suggestion to use floating-point division is mathematically plausible but
suffers from rounding issues in principle. Perhaps double is high-enough
precision to make it work; perhaps not. I don't know and I don't want to have
to figure it out; in any case, the answer is system-dependent.


So it's not clear, even to the original author, that Octave's current system
doesn't work well enough.  Before any large project is undertaken it makes
sense to have an idea of the costs and the benefits.  It seems to me that the
benefits need to be quantified.  How large is the bias of the existing system
(if it even exists) that is implemented in Octave?

Also, relevant, are we any worse than the competition?  According to The
Mathworks, the algorithm Octave implements is the one suggested for random
double numbers (http://www.mathworks.com/help/matlab/ref/rand.html)


Generate a 10-by-1 column vector of uniformly distributed numbers in the
interval (-5,5).

r = -5 + (5+5)*rand(10,1)

r = 10×1

    3.1472
    4.0579
   -3.7301
    4.1338
    1.3236
   -4.0246
   -2.2150
    0.4688
    4.5751
    4.6489

In general, you can generate N random numbers in the interval (a,b) with the
formula r = a + (b-a).*rand(N,1). 


For random integers, The Mathworks suggestion is to use randi(), but I don't
know if they implement something special there or not.

As an attempt at quantifying the bias, I used


x = randi ([1, 10], 1e8, 1);
[nn,xx] = hist (x, 1:10);
errpct = abs (nn - 1e7) / 1e7 * 100
errpct =

 Columns 1 through 8:

   0.0376100   0.0484100   0.0173600   0.0174000   0.0120900   0.0249200  
0.0092400   0.0054200

 Columns 9 and 10:

   0.0250300   0.0050400


So, as much as .05% deviation, but this could be statistical fluctuations. 
Running a second time I get


errpct =

 Columns 1 through 8:

   0.0057600   0.0249700   0.0160600   0.0201400   0.0295600   0.0249300  
0.0148500   0.0361300

 Columns 9 and 10:

   0.0643700   0.0242500


This time it is a .06% deviation, but it is not in the same bin as before so
it doesn't look like systematic bias.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?54619>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]