swarm-support
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Robustness Check


From: Sven N. Thommesen
Subject: Re: Robustness Check
Date: Thu, 08 Jul 1999 19:51:43 -0500

At 04:44 PM 7/8/1999 -0400, Ted Belding wrote:
>On Thu, 8 Jul 1999, Jan Kreft wrote:
>
>> Regarding #1 and #2, it would be better to start the program only once
>> with one seed and do all 100 runs in a sequence. This way, you avoid the
>> possibility of overlap of the sequence of numbers you draw from the random
>> number generator. Also, you have only one file with all results that you
>> can analyse in one go.
>
>In theory, this would be ideal. In practice, it isn't so important. It's
>far more important to use a good, well-tested random number generator
>(RNG) that has a long enough period before it starts to repeat (say at
>least 10000 times as long as the maximum total number of calls that you're
>going to make for all of the runs). You can generally just seed the RNG
>for each run from the current time, or something similar (make sure that
>each seed that you generate this way is different and in the allowed range
>of seeds for your generator!).

Ted, your dissertation on random generators was very nice, but --- have you
checked the Swarm random library lately? ;-)

The default generator is Matsumoto's 'mt19937', which has a period of
2^19937 (10^6001)! This generator is considered very good. Unfortunately,
given the way it's seeded, there is no way to guarantee that two sequences
won't overlap, though I judge the probability of this happening to be
small. (With 2^32 possible seeds, starting points for sequences *should* on
average be 2^19905 apart ...)

As for generators whose period can be split into sub-sequences, such as
L'Ecuyer and Cote, Swarm has those too: C2LCGX has a total period of 2^60,
and C4LCGX has a total period of 2^120. Both are of decent quality, but for
some simulations at least their total period may well be too short. (If
someone comes up with a way to 'split' mt19937, I'll be happy to implement
that!)

It's definitely a good idea to save the random seeds used for experimental
runs, in case you have to rerun some particular case (perhaps because the
results were unexpected in some way.)

Sven Thommesen

>As long as the initial state of the random number generator is simply its
>initial seed and the generator is deterministic, then you can easily
>calculate an upper bound on the probability that two sequences with
>different initial seeds will overlap: It's simply the probability that one
>of the seeds falls within the sequence of random numbers that begins with
>the other seed. An upper bound on this probability is two times the
>maximum number of calls that you make to the generator in one run, divided
>by the period of the random number generator. Now multiply that
>probability by the square of the number of runs. The result is an upper
>bound of the probability that you will have at least one pair of
>overlapping runs. If this second probability is large (larger than say
>0.01), then you should use a different generator with a longer period.
>
>So if the RNG period is long enough, you don't have to worry too much. And
>if your results are so shaky that one pair of overlapping runs makes a
>difference, then you shouldn't be publishing them anyway. :)
>
>If you're *really* worried about this, then you should test your RNG by
>seeding it 100 times, producing 100 sequences of random numbers that are
>as long as the sequences that your program uses, and checking that there's
>no significant correlation between the sequences.  In fact, you should
>check your RNG output even if you're only seeding the RNG once: Simply
>starting with the seed from the previous run won't guarantee independent
>sequences if your RNG is broken. And it's always good to replicate your
>results using a different type of RNG and a different set of seeds.
>
>By the way, in some random number generators, you can easily jump n
>numbers ahead in a sequence. So you can just choose one seed and jump k *
>n numbers ahead from the seed at the beginning of each run, where k is a
>different integer for each run and n is the maximum number of times the
>generator is called in a single run. That will guarantee that each run
>uses a non-overlapping sequence of random numbers. One such generator is:
>
>L'Ecuyer, P., and S. Cote. (1991). Implementing a random number package
>with splitting facilities.  ACM Transactions on Mathematical Software
>17(1):98-111.
>
>Having all of the results in one file might be easier in some situations,
>but it's easy to write a Perl script to collate results from many separate
>files into one file and reformat them as needed.
>
>Finally, you don't have to do your analysis manually if you're using Unix:
>You can write a script that formats your results correctly and analyzes
>them in Mathematica or whatever. You can do the same thing on the Mac
>with MacPerl, Applescript, and Excel. If you're using Excel on Windows,
>you might be able to do the same thing with Visual Basic, Perl, or some
>other scripting language; I don't know (and don't want to know :) 
>-Ted
>
>--
>Ted Belding                              address@hidden 
>University of Michigan Center for the Study of Complex Systems
>Homepage: http://www-personal.umich.edu/~streak/
>PGP key:  http://www-personal.umich.edu/~streak/pgp-key.html
>
>
>
>
>
>                  ==================================
>   Swarm-Support is for discussion of the technical details of the day
>   to day usage of Swarm.  For list administration needs (esp.
>   [un]subscribing), please send a message to <address@hidden>
>   with "help" in the body of the message.


                  ==================================
   Swarm-Support is for discussion of the technical details of the day
   to day usage of Swarm.  For list administration needs (esp.
   [un]subscribing), please send a message to <address@hidden>
   with "help" in the body of the message.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]