Re: use of the random library

swarm-support
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: use of the random library

From:	mcmullin
Subject:	Re: use of the random library
Date:	Fri, 28 Mar 1997 10:40:05 -0700 (MST)

Hi Guys -

Since I was closely involved with Glen, Roger and Sven
in designing the *interface* for the new random number
library (and thereby earned my Official SwarmFest
StinkBug!) I'd better toss in my tuppence worth to this
discussion. I had hoped we would have time for a
SwarmFest session on it, but there really wasn't enough
interest...

The first point is on the functionality of the global,
pre-created, distribution and generators in
Swarm-1.0.x, and especially the fact that the generator
gets initialised to the same state on every run.
Please pay especial attention to this comment in the
simtools docs (simtools is where these objects are
provided):

    id <PMMLCG> randomGenerator, 
    id <UniformInteger> uniformIntRand 
    id <UniformUnsigned> uniformUnsRand 
    id <UniformDouble> uniformDblRand 

    Three global random number distributions are defined in
    case you're too lazy to define your own. Use of these
    globally defined distributions and this generator is
    DEPRECATED. It is seeded with 1234567890 everytime the
    system starts. For more sophisticated simulation you
    should use the random number library to its full
    potential: see the documentation on random for more
    detail.


These global objects were provided specifically to make
porting apps from the Swarm beta release easier.  Given
this objective, and with the wonderful wisdom of
hindsight, the semantic behaviour of trying to pick a
distinct initial generator state on each run should
arguably have been retained.  As Sven has said,
changing that was certainly not *his* idea (my memory
is a bit fogged, but the idea was almost certainly mine
- mea culpa!).

So what *was* I thinking of?

You'll note the comment above about use of these global
objects being deprecated.  My real argument was (and
is) that these objects should not be provided *at all*.
This is because providing these "canned" objects
*hides* some important things from the app developer -
things he or she really *should* think about and
understand before *relying* on any results of running
the app. But I was persuaded (correctly!) that the need
to ease porting from the beta justified retaining
them. I still harbour the idea that they may well
disappear in a future swarm release; I urge people to
take the note quoted above seriously, and *not* to use
these objects in new apps, and to *remove* usage from
existing apps at the first convenient opportunity.  

I may be re-writing history a little, but I *think* the
logic of changing the initialisation semantics (from
trying to get different initial states to simply always
using a fixed initial state) was partly because the
method previously in use for this was IMHO "horrible"
(i.e. we can say absolutely nothing useful about its
statistics), and partly to force people who were
relying on this to realise that they really shouldn't.

And it seems to have achieved that, so maybe it wasn't
such a bad idea after all (;-)

Final point on these canned objects:  in a previous
message, Sven emphasised the following point:

> b) all the 3 uniform objects draw their random numbers from the
> SAME generator.

I'm not sure what particular significance Sven intended
to attach to this (though I'll return to that below).
But let me say that, as a general principle, I believe
one should *not* use more than one generator instance
in any given app.  The reason is simply that, if you
use multiple instances there may well be unexpected
correlations between the streams from the different
instances (this is obviously especially the case if you
use multiple instances of the *same* generator class,
but it *can* arise even with instances from different
classes).  Whereas: if you use a single instance of a
"good" generator, the designers of that generator are
providing some kind of analytic and/or numerical
assurances that you will not experience such unexpected
correlations (that is the *only* thing that qualifies
one of these things as a "random" number generator in
the first place!).  Conversely, if you mix mutiple
generators, all bets are off, and the designers of the
generators can give you *no* assurances whatsoever
about the statistical quality of what happens.


Now let me more on to this from address@hidden
(Bob Bell):

>Anyway, it appears as if I need to set a new seed for
>each run. I suppose if I made them sequential (kind
>of a seed++ thing) that that would induce a correlation.
>Know of a way to do this? The positive side here is
>that by knowing my seeds I can reproduce any unusual
>run that might occur :-) or should I use the canonical
>[grin] from glen.

Sven has already answered this in detail.  I would add
that using sequential seeds is no worse a way of doing
it than any other - i.e. does not, a priori, have a
greater chance of introducing unexpected correlations
than any other.  But of course it *might*.

For preference this should be avoided.  As Sven has
explained, the *desirable* thing would be to ensure
that all your runs use disjoint fragments of the
generator state space.  This firstly requires a
generator with a big enough state space; and on your
numbers, that currently means SWB. Secondly, you need
to have a way of picking points in the state space
which get to long enough, non-overlapping, stretches of
states.  That is the tricky bit. Again, as Sven says,
he has identified some very promising avenues for
making this a lot easier than it is at the moment - but
that's still only in the pipeline.  For now, you might
indeed use sequential (or otherwise systematically
chosen) seeds for the different runs (the seed space is
smaller than the state space for SWB, but the theory is
that the mapping from seed to state is such that the
seed space is "well distributed" over the state space;
but frankly, we can make very few if any guarantees
about that at the moment).  

HOWEVER: *if* your different runs are being done
*sequentially* then there is a method of *guaranteeing*
that each run gets a disjoint fragment of the SWB state
space.  This is simply to extract the SWB state at the
end of each run and use that as the starting state for
the next run.  If the runs *are* sequential, I strongly
recommend this approach.

Now let me comment on Rick Riolo's comment:

> It seems to me that whether its acceptable
> to have the RN sequences overlap over different
> runs of an experiment will be model dependent.
> 
> For example, if the model is going through some
> evolution that is generating new mixes of populations
> over the course of the runs, the same (sub)sequence of RNs, 
> when it is used at different times in the runs, 
> is very unlikely to be used "in the same way," 
> i.e., in some way that will introduce undesirable 
> correlations across the runs.
> 
> of course if the sequences are NOT overlapping,
> one doesn't have to worry about whether one's
> model will not be adversely affected by sharing
> subsequences of RNs at different points in runs,
> making such RNG's a goal to shoot for.

I essentially agree with everything he says - but
*especially* his last remark: the *best* situation, *if
you can possibly achieve it*, is to ensure that the
sequences used in the different runs are not
overlapping...

Finally (phew!) a few critical (sorry!) remarks about
Sven's "myRandom":

o myRandom.h #import's header files from the random
subdir rather than just #import'ing random.h.  It's not
apparent to me why - since, as far as I can see, he
only wants to use some of the protocol declarations
which are in random.h.  In general, it seems to me a
bad idea to #import the more specific files - because
they are *not* guaranteed to remain compatible between
releases.

o Much more importantly: when creating his own
distribution objects, Sven has elected to use multiple
instances of generator objects to drive them.  Granted,
he has used instances of different classes; but, for
the reasons I already explained above, I don't think
you should *ever* use more than one generator object in
a given app. (OK: there are *very special*
circumstances in which this might be a good idea: but,
my basic recommendation is: use just one generator
object unless you *really really* know what you are
doing...)  I should add that I have discussed this
issue with Sven, but I don't think either of us has
ever managed to persuade the other, so I think we'll
have to agree to differ ... unless someone else out
there can add anything further?


- Cheers,

Barry.


-- 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| Barry McMullin, ALife Group,               |    address@hidden |
| Santa Fe Institute, 1399 Hyde Park Road,   |  Voice: +1-505-984-8800 |
| Santa Fe, NM 87501, USA.                   |  FAX:   +1-505-982-0565 |
| http://www.eeng.dcu.ie/~mcmullin           |                         |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Prev in Thread]
Current Thread
[Next in Thread]
Re: use of the random library, Sven N. Thommesen, 1997/03/27
- Re: use of the random library, mcmullin <=
  - Re: use of the random library, glen e. p. ropella, 1997/03/29
Prev by Date: Re: Swarm and OpenGL
Next by Date: L'Ecuyer's RNGs
Previous by thread: Re: use of the random library
Next by thread: Re: use of the random library
Index(es):
- Date
- Thread