[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[comp.programming] Re: Algorithm question: Generating random vectors sub
From: |
jalex |
Subject: |
[comp.programming] Re: Algorithm question: Generating random vectors subject to a constraint |
Date: |
10 Oct 1999 09:51:03 -0700 |
For those who might be interested, here's a response to the random
vector question from comp.programming. The first chunk of the message
summarizes the problem and the various solutions that were put forth.
I haven't worked through the two solutions proposed, but thought they
should be posted. (Method 5 sounds like it would tedious to
implement, though.)
I think the claim that Method 4 oversamples the edges is correct.
Consider some sample distributions created using it (each row must sum
to 10000):
4210 5375 1 25 65 256 0 1 5 0
62
7995 1545 0 120 23 0 4 172 140 0
1
105 299 5983 100 1967 273 40 7 10 12
1204
6802 126 9 1 391 24 26 2439 175 0
7
2 4 6861 0 337 1 1566 2 1226 1 0
4831 0 0 0 1093 3808 0 117 151 0 0
6737 4 338 0 809 1526 292 182 21 84
7
0 504 123 4084 1 2 0 0 5244 31
11
18 1 8658 250 13 27 8 5 799 0
221
8457 8 116 291 13 376 5 71 138 520
5
2 19 0 8 11 82 7895 0 0 1968
15
1 1069 32 2 0 0 0 47 323 12
8514
3 1133 17 4343 24 463 13 271 14 3654
65
0 2090 11 1594 5382 762 97 52 9 0
3
3 1241 92 1272 16 0 39 101 7102 1
133
The large number of zeros appearing in each row seems indicative of
edge oversampling. (This general pattern continues for the next 985
rows.)
Cheers,
Jason
------- Start of forwarded message -------
From: address@hidden (Richard Harter)
Newsgroups: comp.programming
Subject: Re: Algorithm question: Generating random vectors subject to a
constraint
Date: Sun, 10 Oct 1999 08:36:36 GMT
Organization: The Internet Access Company, Inc.
Message-ID: <address@hidden>
References: <address@hidden>
address@hidden wrote:
>
>I have an algorithm design problem that I encounter semi-frequently and
>have yet to find a fully satisfactory solution. Any help would be
>greatly appreciated.
>
>The problem: generate a random array v[n] of doubles subject to the
>following constraints:
>
> 1. 0.0 <= v[i] <= 1.0 for all i = 0, 1, ..., n-1.
>
> 2. v[0] + ... + v[n-1] = 1.0
>
> 3. The method generating the random array v[n] should give a
> uniform distribution over the space determined by (1) and (2).
>
>Now, there are a number of methods that come to mind immediately.
>
>
>Method 1. (bad)
>===============
>
> sum = 0
> for i=0,...,n-1 {
> v[i] = random double between 0.0 and 1.0
> sum += v[i]
> }
> for i=0,...,n-1
> v[i] = v[i]/sum
>
>Although this method does create an array satisfying conditions (1)
>and (2), it fails condition (3). To see this, think about the simple case
>when n=2. The algorithm basically takes any point in the unit square
>and orthogonally projects it onto the line from (1,0) to (0,1). This
>means that points near the "middle" of the line are oversampled when
>compared to the ends. This, I believe, generalizes to arbitrary n.
Correct.
>Method 2. (also bad)
>====================
>
>Numerical Recipes offers the following hint for uniformly sampling
>points from an oddly-shaped space V: find a space W _containing_ V
>which is easy to sample from uniformly, then do so, throwing away any
>points which don't fall within V.
>
> in_space = FALSE
> do {
> sum = 0
> for i=0,...,n-1 {
> v[i] = random double between 0.0 and 1.0
> sum += v[i]
> }
> if (sum==1.0)
> in_space = TRUE
> } while (in_space == FALSE)
>
>However, since the space V we are trying to sample from is only a
>hyperplane of the n-dimensional unit cube W that we are picking points
>out of, we have virtually _no_ chance of ever getting a valid vector
>v using this method.
Correct; however see below.
>Method 3. (also bad, but promising)
>===================================
>
>Since the sum of all the vectors has to be 1.0, keep track of how much
>is "left" after each random assignment.
>
> amount_left = 1.0
> for i=0,...,n-1 {
> v[i] = random double between 0.0 and amount_left
> amount_left -= v[i]
> }
>
>This method (I think) introduces bias towards the first element of the
>array. It's much easier for v[0] to be large than v[n-1], since it
>only takes 1 high "roll of the dice" to get a large value of v[0] but
>n-2 low rolls to get a large value of v[n-1].
This isn't correct; it is not just a question of bias; the distribution
for v[i] for any i is not uniform.
>Method 4. (good?)
>=================
>
>Use the method above, but then randomly shuffle the array v
>afterwards.
This isn't right but it isn't biased towards any v[i]; the general
problem is that it oversamples the corners.
>I can't think of any immediate problems with this, but that doesn't
>mean that there aren't any. :-)
There are two methods that occur to me offhand. One is correct and
messy; the other is easier to implement but I'm not certain that it is
completely correct.
Method 5:
We observe that the v[0] is distributed with the distribution function
(n-1)(1-x)^(n-2), x in [0,1]. Select v[0]. Similarly v[1] is
distributed C(1-x)^n-3 , x in [1-v[0],1], for suitable C, v[2]
distributed C(1-x)^4, x in [1-v[0]-v[1],1], et cetera. There is some
moderately messy algebra here and you have to know how to transform a
uniform distribution into a specified distribution. The procedure is a
pain to set up but it is mathematically correct.
Method 6:
I haven't thought this out thoroughly but I think it may be correct.
Construct an orthonormal linear basis for the subspace orthogonal to
(1,1,...,1) and construct a box in this subspace containing the desired
region. I haven't worked out a formula for this but it should be easy.
Select points in the enclosing box using independent uniform
distributions. Add the selected point to (1/n,...,1/n) and use the
rejection criterion of method 2 (are all coordinates in the original
space in [0...1]).
Hope this helps.
Richard Harter, address@hidden, The Concord Research Institute
URL = http://www.tiac.net/users/cri, phone = 1-978-369-3911
London was like a beautifully dressed woman with dirty underwear.
-- Mary Brown, _Dragonne's EG_
------- End of forwarded message -------
==================================
Swarm-Modelling is for discussion of Simulation and Modelling techniques
esp. using Swarm. For list administration needs (esp. [un]subscribing),
please send a message to <address@hidden> with "help" in the
body of the message.
==================================
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [comp.programming] Re: Algorithm question: Generating random vectors subject to a constraint,
jalex <=