[comp.programming] Re: Algorithm question: Generating random vectors sub

swarm-modeling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[comp.programming] Re: Algorithm question: Generating random vectors sub

From:	jalex
Subject:	[comp.programming] Re: Algorithm question: Generating random vectors subject to a constraint
Date:	10 Oct 1999 09:51:03 -0700

For those who might be interested, here's a response to the random
vector question from comp.programming.  The first chunk of the message
summarizes the problem and the various solutions that were put forth.
I haven't worked through the two solutions proposed, but thought they
should be posted.  (Method 5 sounds like it would tedious to
implement, though.)

I think the claim that Method 4 oversamples the edges is correct.
Consider some sample distributions created using it (each row must sum
to 10000):

4210    5375    1       25      65      256     0       1       5       0       
62
7995    1545    0       120     23      0       4       172     140     0       
1
105     299     5983    100     1967    273     40      7       10      12      
1204
6802    126     9       1       391     24      26      2439    175     0       
7
2       4       6861    0       337     1       1566    2       1226    1       0
4831    0       0       0       1093    3808    0       117     151     0       0
6737    4       338     0       809     1526    292     182     21      84      
7
0       504     123     4084    1       2       0       0       5244    31      
11
18      1       8658    250     13      27      8       5       799     0       
221
8457    8       116     291     13      376     5       71      138     520     
5
2       19      0       8       11      82      7895    0       0       1968    
15
1       1069    32      2       0       0       0       47      323     12      
8514
3       1133    17      4343    24      463     13      271     14      3654    
65
0       2090    11      1594    5382    762     97      52      9       0       
3
3       1241    92      1272    16      0       39      101     7102    1       
133

The large number of zeros appearing in each row seems indicative of
edge oversampling.  (This general pattern continues for the next 985
rows.)

Cheers,

Jason


------- Start of forwarded message -------
From: address@hidden (Richard Harter)
Newsgroups: comp.programming
Subject: Re: Algorithm question: Generating random vectors subject to a 
constraint
Date: Sun, 10 Oct 1999 08:36:36 GMT
Organization: The Internet Access Company, Inc.
Message-ID: <address@hidden>
References: <address@hidden>

address@hidden wrote:

>
>I have an algorithm design problem that I encounter semi-frequently and
>have yet to find a fully satisfactory solution.  Any help would be
>greatly appreciated.
>
>The problem: generate a random array v[n] of doubles subject to the
>following constraints:
>
>  1. 0.0 <= v[i] <= 1.0  for all i = 0, 1, ...,  n-1.
>
>  2. v[0] + ... + v[n-1] = 1.0
>
>  3. The method generating the random array v[n] should give a
>     uniform distribution over the space determined by (1) and (2).
>
>Now, there are a number of methods that come to mind immediately.
>
>
>Method 1. (bad)
>===============
>
>  sum = 0
>  for i=0,...,n-1 {
>    v[i] = random double between 0.0 and 1.0
>    sum += v[i]
>  }
>  for i=0,...,n-1 
>    v[i] = v[i]/sum
>
>Although this method does create an array satisfying conditions (1)
>and (2), it fails condition (3).  To see this, think about the simple case
>when n=2.  The algorithm basically takes any point in the unit square
>and orthogonally projects it onto the line from (1,0) to (0,1).  This
>means that points near the "middle" of the line are oversampled when
>compared to the ends.  This, I believe, generalizes to arbitrary n.

Correct.

>Method 2. (also bad)
>====================
>
>Numerical Recipes offers the following hint for uniformly sampling
>points from an oddly-shaped space V: find a space W _containing_ V
>which is easy to sample from uniformly, then do so, throwing away any
>points which don't fall within V.
>
>  in_space = FALSE
>  do {
>    sum = 0
>    for i=0,...,n-1 {
>      v[i] = random double between 0.0 and 1.0
>      sum += v[i]
>    }
>    if (sum==1.0)
>      in_space = TRUE
>  } while (in_space == FALSE)
>
>However, since the space V we are trying to sample from is only a
>hyperplane of the n-dimensional unit cube W that we are picking points
>out of, we have virtually _no_ chance of ever getting a valid vector
>v using this method.

Correct; however see below.

>Method 3. (also bad, but promising)
>===================================
>
>Since the sum of all the vectors has to be 1.0, keep track of how much
>is "left" after each random assignment.  
>
>  amount_left = 1.0
>  for i=0,...,n-1 {
>    v[i] = random double between 0.0 and amount_left
>    amount_left -= v[i]
>  }
>
>This method (I think) introduces bias towards the first element of the
>array.  It's much easier for v[0] to be large than v[n-1], since it
>only takes 1 high "roll of the dice" to get a large value of v[0] but
>n-2 low rolls to get a large value of v[n-1].

This isn't correct; it is not just a question of bias; the distribution
for v[i] for any i is not uniform.

>Method 4. (good?)
>=================
>
>Use the method above, but then randomly shuffle the array v
>afterwards.

This isn't right but it isn't biased towards any v[i]; the general
problem is that it oversamples the corners.

>I can't think of any immediate problems with this, but that doesn't
>mean that there aren't any. :-)

There are two methods that occur to me offhand.  One is correct and
messy; the other is easier to implement but I'm not certain that it is
completely correct.

Method 5:

We observe that the v[0] is distributed with the distribution function
(n-1)(1-x)^(n-2), x in [0,1].  Select v[0].  Similarly v[1] is
distributed  C(1-x)^n-3 , x in [1-v[0],1], for suitable C, v[2]
distributed C(1-x)^4, x in [1-v[0]-v[1],1], et cetera.  There is some
moderately messy algebra here and you have to know how to transform a
uniform distribution into a specified distribution.  The procedure is a
pain to set up but it is mathematically correct.

Method 6:

I haven't thought this out thoroughly but I think it may be correct.
Construct an orthonormal linear basis for the subspace orthogonal to
(1,1,...,1) and construct a box in this subspace containing the desired
region.  I haven't worked out a formula for this but it should be easy.
Select points in the enclosing box using independent uniform
distributions.  Add the selected point to (1/n,...,1/n) and use the
rejection criterion of method 2 (are all coordinates in the original
space in [0...1]).

Hope this helps.

  

Richard Harter, address@hidden, The Concord Research Institute
URL = http://www.tiac.net/users/cri, phone = 1-978-369-3911
London was like a beautifully dressed woman with dirty underwear.
-- Mary Brown, _Dragonne's EG_
------- End of forwarded message -------


                  ==================================
   Swarm-Modelling is for discussion of Simulation and Modelling techniques
   esp. using Swarm.  For list administration needs (esp. [un]subscribing),
   please send a message to <address@hidden> with "help" in the
   body of the message.
                  ==================================

[Prev in Thread]

Current Thread

[Next in Thread]

[comp.programming] Re: Algorithm question: Generating random vectors subject to a constraint, jalex <=

Prev by Date: Re: Randomly initializing a N-element vector
Next by Date: Re: how about this idea for sampling in the simplex?
Previous by thread: generating constrained random vectors - a proof ( i think)
Next by thread: Re: how about this idea for sampling in the simplex?
Index(es):
- Date
- Thread