swarm-support
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Swarm-Support] Random Number Generator and Distributions(continuing


From: glen e. p. ropella
Subject: Re: [Swarm-Support] Random Number Generator and Distributions(continuing)
Date: Mon, 02 Feb 2009 16:56:27 -0800
User-agent: Thunderbird 2.0.0.19 (X11/20090105)

Thus spake Paul Johnson circa 30/01/09 09:13 AM:
> Glen Ropella was explaining that creating a lot of rng objects in that 
> way was dubious, or possibly that it was just too slow.  I'm scanning my 
> email to find the explanation, but can't turn it up.  Maybe he is 
> reading now and he can remember the argument.  He had an alternative 
> generator approach in mind for a case like that, I think it was 'lookup 
> table' or something similar to that.

Yes, it is dubious because multiple pseudo-random number generators
(prngs) will produce a kind of interference, especially if they all use
the same algorithm (e.g. MT19937).  The period of a single prng is very
long.  If multiple prngs are used, you are effectively aggregating the
prngs into one massive prng.  And the aggregate has a _shorter_ period
than a single one in isolation.  This is true even if the calls to the
prngs are entirely independent, because the behavior of the whole
simulation can exhibit periodicity artifacts even though the prn streams
from each prng are not repeating.

You can look at it from an autocorrelative perspective.  To what extent
is one draw from a prng related to another draw from that same prng?
Well, since these are _deterministic_ algorithms, any given draw (d) has
a 1st order relationship to the draws immediately before (d-1) and after
(d+1) it, a 2nd order relationship to the draws d-2 and d+2, a 3rd order
relationship to draws d-3 and d+3, etc. all the way up to d - period/2
and d + period/2.

I haven't done the math; but I assume a prng-driven simulation with 2
rngs (using the same algorithm) can exhibit a period of 1/2 the period
of the original generator (best case).  For MT19937, this might not be a
big deal.  The period of MT19937 is (2^19937)-1, divide that by 2 and
you get (2^19936)-1/2.  And that's still ginormous.

More practically, let's say you wanted a prng for each agent and you had
1 gig of agents (2^30), each of which made 1 prng draw per cycle.
Ignoring the "-1", your simulation would have a period of 2^19907 (again
this is the _best_ case).  If you had 2^19937 agents, your simulation
would have a period of 1.

But that's _assuming_ the best case, where the prng is really robust.
Most prngs aren't that robust.  As soon as you start playing games like
this, shortening the period, you may hit a degenerate case where, even
though you only have, say, 1024 agents, you see periodic artifacts.
This can happen if the various _regions_ of the large prng are
correlated somehow.  E.g. perhaps the 1st 2^10 numbers in mt19937 map
isomorphically to the 137th set of 2^10 numbers (perhaps the numbers in
the 137th 2^10 can be found by adding 3 to those of the 1st).  Or, even
more generally, if you start one prng at node X in the cycle and another
prng at node Y in the cycle, the value of every draw from the second
prng might be predictable from the value of every draw from the first prng.

>From a complex model perspective, there's another issue.  Many
agent-based models explore local and nonlocal information and how that
affects the model.  Multiple prngs can inscribe global information into
your simulation.  E.g. if you start prng_1 at node X and prng_2 at node
X+1, then the agent using prng_2 will always "know" the value of prng_1
in the other agent (because it got that number in the previous cycle).

So, if the point of your model was to evolve agents that can predict the
behavior of other agents, then using multiple prngs can trick you into
thinking your simulation does that, when in reality, it's an artifact of
your prngs.

I'm not suggesting any of this is likely to happen.  But when you use
multiple prngs of the same algorithm, what you're really doing is
hunting for a randomness test under which the prng is demonstrably
predictable.  (See: http://en.wikipedia.org/wiki/Diehard_tests)

By sticking to a single prng (with many distributions hanging off it),
you can be sure to avoid these pitfalls.  An alternative strategy is to
use a service like:  http://random.org.  There are also large genuine
random number datasets you can purchase or download.

-- 
glen e. p. ropella, 971-222-9095, http://agent-based-modeling.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]